Question:

Discuss the Statistics and Introduction to Programming in MATLAB.

Answer:

# Summary of Chapter 4: Statistics and Introduction to Programming in MATLAB

## Arrays and Bar Charts

MATLAB is a software that can efficiently utilized for handling of problems related to statistics and probability. Statistics is a field of mathematics that is used to calculate some data using some other discrete sets of data using some operators like variance, mean, median, mode, standard deviation and others. The use of MATLAB is very simple and hence, this software is well suited for handling problems that are mainly related to probability and statistics. In this chapter, the instructions for using MATLAB to do basic statistics on data, calculate probabilities, and present results have been analyzed in detail. To introduce MATLAB’s programming facilities, some examples have been taken and solved. These examples are used to solve problems using coding and then they are compared to solutions based on built-in MATLAB functions. A sample problem is as follows.

One classroom has 36 students and the scores scored by them in an exam are as follows.

One student scored 100, Two students scored 96, Four students scored 90, Two students scored 88, Three students scored 85, One student scored 84, Two students scored 82, Seven students scored 78, Four students scored 75, Six students scored 70, One student scored 69, Two students scored 63, One student scored 55.

In MATLAB, these numbers can be entered as follows.

x = [55, 63, 69, 70, 75, 78, 82, 84, 85, 88, 90, 96, 100]

y = [1, 2, 1, 6, 4, 7, 2, 1, 3, 2, 4, 2, 1]

Where x is the set of the scores of students and y is the set of the students corresponding to each of the scores.

Using Matlab, a bar chart can be created using these values by utilizing the bar command. This command acts like a graphical plot as the data from x and y sets are plotted in perpendicular axes.

However, in order to get a more precise set of data (as the grades of the students in this particular set of information), a different technique is used. First, the data range is created manually for the different grades.

One student scored 50–59, Three students scored 60–69, Seventeen students scored 70–79, Eight students scored 80–89, Seven students scored 90–100.

Accordingly, two distinct arrays will be created including two midpoints that are created in Matlab as follows.

a = [54.5, 64.5, 74.5, 84.5, 94.5]

b = [1, 3, 17, 8, 7]

Where a is the set of midpoints for the grade range and b is the set of students who got numbers in each range.

This is done by using the following command in Matlab:

>> bar (a,b), xlabel(‘Score’), ylabel(‘Number of Students’), title(‘Algebra Midterm Exam’)

While MATLAB has a built-in histogram function called *hist*, one may find it producing useful charts on a part-time basis. Before moving on one note some variations on bar charts one can use to present data. The best option to use the same is to manually generate a bar chart the way one have done as shown previously. For example, one can use the barh command to present a horizontal bar chart:

>> barh(a,b),xlabel(‘Number of Students’),ylabel(‘Exam Score’)

## Writing Functions Using Matlab

In relation to the calculations in Matlab, it can be noted here that it can also be used for implementing some functions for programming purposes. The following shows a mathematical function that needs to be used in this particular programming lesson.

For creating a function, the first step in Matlab will be to create a file with the extension .m that is done by the following steps.

- Click the File pull-down menu
- Select New ® m file

This opens the file editor that one can use to type in the script file. Line numbers are provided on the left side of the window. On line 1, one needs to type in the word function, along with the name of the variable one can use to return the data, the function name, and any arguments. This function can be called as *myaverage*. The function will take two arguments:

- An array x of data values
- An array N that contains the number at each data value N(x)

The first line of this code will be as follows.

function ave = myaverage(x,N)

To compute the average correctly, x and N must contain the same number of elements. This size command can be effectively used to determine how many elements are present in each array. The results will be stored in two variables called sizex and sizeN.

sizex = size(x);

sizeN = size(N);

The variables sizex and sizeN are actually row vectors with two elements. For example if x has four data points then:

sizex =

1 4

So to test the values to see if they are equal, one needs to check sizex(2) and sizeN(2). One way to test them using the help of the “if” statement is to ask if sizex is greater than sizeN OR sizeN is greater than sizex. In MATLAB, OR command is indicated with a “pipe” character, i.e. |. So this would be a valid way to check this condition:

if (sizex(2) > sizeN(2)) | (sizex(2) <sizeN(2))

Another way is to simply ask if sizex and sizeN are not equal. Not equal notation is indicated by preceding the equal sign with a tilde, in other words if sizex is NOT EQUAL to sizeN would be implemented by writing:

if sizex(2) ~= sizeN(2)

If the two sizes are not equal, then the function needs to be terminated at that particular point. If they are equal, the function will proceed further and the average will be calculated. This can be implemented using an if –else statement. What needs to be is to use the disp command to print an error message to the screen if the user has passed two arrays that are different sizes. The first part of the if-else statement looks like this:

if sizex(2) ~= sizeN(2)

disp(‘Error: Arrays must be same dimensions’)

The completed function will look like the following:

function ave = myaverage(x,N)

sizex = size(x);

sizeN = size(N);

if sizex(2) ~= sizeN(2)

disp(‘Error: Arrays must be same dimensions’)

else

total = sum(N);

s = x.*N;

ave = sum(s)/total;

end

## Programming Using For Loops

For loop is a conditional command that instructs the Matlab software to send the given function in a loop under certain given conditions. The syntax for the For loop is denoted as follows.

for index = start: increment : finish

statements

end

The idea of the use of For loop can be illustrated by writing a simple function that sums the elements in a row or column vector. If the increment parameter is left out of the For Loop statement, MATLAB assumes that the increment value must be unity. The first step in the function is to declare the function name and get the size of the array passed to the function:

function sumx = mysum(x)

%get number of elements

num = size(x);

## Calculation of Median and Standard Deviation

In order to explain this, another example is taken. This is as follows.

There are some employees in an office whose ages are as follows:

Two employees aged 17, One employee aged 18, Three employees aged 21, One employee aged 24, One employee aged 26, Four employees aged 28, Two employees aged 31, One employee aged 33, Two employees aged 34, Three employees aged 37, One employee aged 39, Two employees aged 40, Three employees aged 43.

The first thing that needs to be done for utilizing this data is to create an array of absolute frequency data. This is the array N(j) that has been used in the previous sections. This time an entry will be made for each age, so if one puts a 0, then it must be because no employees are listed with the given age. It can be called f_abs for absolute frequency:

f_abs = [2, 1, 0, 0, 3, 0, 0, 1, 0, 1, 0, 4, 0, 0, 2, 0, 1, 2, 0, 0, 3, 0, 1, 2, 0, 0, 3];

In order to keep the bin width 1, the following command is used:

binwidth = 1;

Now an array is created that is used to represent the ages ranged from 17 to 43 with a binwidth of one year:

bins = [17:bin width:43];

Now for the collection of raw data, the For loop is executed as follows.

raw = [];

for i = 1:length(f_abs)

if f_abs(i) > 0

new = bins(i)*ones(1,f_abs(i));

else

new = [];

end

raw =[raw,new];

end

The created array shows the following.

raw =

Columns 1 through 18

17 17 18 21 21 21 24 26 28 28 28 28 31 31 33 34 34 37

Columns 19 through 26

37 37 39 40 40 43 43 43

If the standard deviation is small, that means most of the data is near the mean value. If it is large, then the data is more scattered. Since our bin size is 1 year in this case, an 8.4 year standard deviation indicates the latter situation applies to this data. Let’s plot a scaled frequency bar chart to look at the shape of the data. The first step is to calculate the “area” of the data:

area = binwidth*sum(f_abs);

Now it is scaled as:

scaled_data = f_abs/area;

And generate a plot:

bar(bins,scaled_data),xlabel(‘Age’),ylabel(‘Scaled Frequency’)

The basic statistical data for this set of employees is:

>> mu = mean(raw)

mu =

24.6538

>> med = median(raw)

med =

25

>> sigma = std(raw)

sigma =

3.3307

## The While Statement

The While statement is executed by the following program:

n = input(‘Enter number of terms in sum: ‘)

Then the user can initialize some variables:

i = 1;

sum = 0;

Now here is the while loop:

while i <= n;

sum = sum + 1/I;

i = i + 1;

end

The user can report the answer to the user this way:

disp(‘Total:’)

Total:

disp(sum)

## Switch Statements

The syntax for this function is given below.

switch expression

case 1

do these statements

case 2

do these statements

case n

do these statements

end

The use of MATLAB is very simple and hence, this software is well suited for handling problems that are mainly related to probability and statistics. MATLAB is a software that can efficiently utilized for handling of problems related to statistics and probability. Statistics is a field of mathematics that is used to calculate some data using some other discrete sets of data using some operators like variance, mean, median, mode, standard deviation and others.

# Reference

McMahon, D. (2007). *MATLAB demystified*. New York: McGraw-Hill.