Statistics

Question 1
Does all data have a mean, median, or mode Explain why or why not. When is the mean the best measure of central tendency When is the median the best measure of central tendency What is the difference between the standard deviation and the variance In other words, do they provide different information or serve different purposes

Measures of location, namely measures of central tendencies and dispersion are useful indices in interpreting collected data. Mean, mode and median are some of the measures of central tendency that suffice in helping determine how much a given value represents a population (Dawson,  Trapp, 2004).  The mean is the sum of all values divided by the number of the values while the mode is the value that occurs the most in a set of data. The median represents the value at the center of distribution of a given set of data such that it divides the data into a upper half and lower half (Dawson,  Trapp, 2004). These definitions show that not all data can have measures of central tendency. For instance, nominal data- data which exists only as given identified groups that are non continuous in nature e.g race and gender, can only be described by the mode (Dawson,  Trapp, 2004).

The median is the best measure for data whose distribution results in extreme values  that represent the population least (Dawson,  Trapp, 2004). Examples of such cases are skewed distributions and in data sets with a few set of values. The mean best represents continuous data with many integral values such as data on the height of individuals(Dawson,  Trapp, 2004).

Measures of dispersion, standard deviation and variance provide information on how far a value lies from the mean (Dawson,  Trapp, 2004). The variance is the arithmetic mean of squared differences   from the mean for a set of values. The standard deviation is the square root of the variance. These measures thus somewhat offer different accounts of dispersion. The variance tells of  dispersion in  units that are different from those of the data since their differences from the mean are squared during computation to allow for mathematical ease (Dawson,  Trapp, 2004). These squared differences also aid in highlighting values that lie far from the mean as such values will have huge figures when squared (Dawson,  Trapp, 2004).The standard deviation on the other hand, allows for localization  as by virtue of being the square- root of the variance,  it has similar units to those of the data.

Question 2
What is a real-world example of a sample What is a real-world example of a population Are there any true random samples Explain why or why not. Describe a standard normal curve. What is the connection between a normal curve and a Z score

Population refers to the whole range of data that is of interest to the investigator while a sample is a representative group picked from the population that after statistical handling, will derive indices that can be ascribed to the population as a whole (Dawson,  Trapp, 2004). Thus for instance when researching on anxiety disorders, the number of people suffering from the disorders in a locality is the population while patients undergoing treatment in institution based clinics in the locality could be used as the sample group.

To have a truly random sample the process of picking the sample needs to be completely randomized. To achieve this, the investigator needs to be able to access the population in its entirety and develop a sampling frame listing all the entities in a population (Dawson,  Trapp, 2004). From this list, using a random method such as technology assisted means, the random sample is picked ensuring that each entity has a an equal chance of being picked and it in turn does not influence the picking of another. It is usually difficult accessing the population as a whole without bias (Dawson,  Trapp, 2004). It is equally difficult constructing a complete sampling frame, thus a truly random sample only exists in theory as an assumption that enables statistical analysis. This is because it would be difficult to infer some indices such as the population mean from the sample mean in the absence of randomization.

A standard normal curve is a mathematical theoretical model that assumes a bell shape with the distribution of values symmetrical to the mean and 95 of the values falling within  - 2 standard deviation of the mean while the measures of central tendency all have the same value. The mean and standard deviation can be used to characterize the curve(Dawson,  Trapp, 2004). The area below the curve is in direct proportion to the relative frequency of the data observed. It is described as asymptotic when its ends lie in proximity but not in contact with the y axis .

The Z score are a way of standardization of values against the values of data of a given mean and standard deviation. It allows localization of a particular score in terms of how many standard deviations it is from the mean (Dawson,  Trapp, 2004). Such that when given the two measures, mean and standard deviation, the number of values within a given deviation not described in complete standard deviations. On the normal curve as alluded to above, within certain standard deviations, the distribution of values can be located in terms of percentages. For instance, within  - 1SD 68 of values lie. To accurately determine where a value lies in relation to the set of data that it is part of, the value is converted into a Z score ( calculated as the value less the mean divided by the standard deviation) therefore effectively converting the value to a new scale expressed in standard deviations (Dawson,  Trapp, 2004). In  a normal distribution where the relative  frequency of distributions is directly proportional to the area beneath the curve, the number and proportion of values lying within a certain standard deviation can be established by converting any given category of values into standardized group of scores(Dawson,  Trapp, 2004). Since the Z score is just the value represented in standard deviations, then any given value can be located by  converting into a Z score and matching it to a table made from scores derived from a normal distribution ( Dawson,  Trapp, 2004). This is because in a normal distribution the area under the curve represents all the values (100).

Question 3
A histogram shows that the mean of rain in your area has increased over the last five years. What is the probability that the mean will be higher this year than in the previous five yearsYou are a contestant in a game show. There is a car behind one of three curtains before you. You choose curtain 1 as the curtain with the car behind it. Susie (the hostess) decides to show you what is behind one of the curtains you did not choose she opens curtain 2 to reveal empty space. Now, before showing you the location of the car, she asks you if you want to keep your first choice (curtain 1) or switch to curtain 3. What should you do Why

Probability represents the likelihood of an event occurring. The probability of getting the event P(E) is calculated by dividing (E) with (S) where (E) is the number of the event being investigated while (S) represents the number of the possible range of outcomes expected (). Thus the probability of the rainfall mean being higher than the previous five years referred to in the question above is the event of interest (E) being the mean rainfall being higher than the previous five years while (S) could either be it is higher than the previous five, lower than the previous five, or a mixture of both. Thus the probability of a higher mean rainfall is 13.

I would not change my option since the probability of getting the car remains the same in both instances. Before the hostess opened the second curtain, using classical probability formula, the event of interest (E) is car, the sample space (S) is - car, no car, no car, since the . The probability that I would have gotten the car would have been 13 this is a lower probability as it is close to zero probability in the scale of probability. By opening the second curtain, (S) is reduced by one while (E) remains as one. The P(E) in this case is  which represents a fifty percent chance of the likely event occurring with either option.

0 comments:

Post a Comment