To determine which function to use and to insert the correct variables in the correct places, you need to know some key statistical terms. This section describes these terms.
The science of statistics makes a fundamental distinction between two types of data sets, population data and sample data. A population is the set of all elements of interest, while a sample is a subset of that population, drawn to make inferences about the characteristics of the population. For example, if you want to describe the average number of televisions in American households, you can’t possibly collect data for the entire population (all American households). Instead, you must draw a sample from the population and make an estimate about the whole population based on that sample. Unless otherwise stated, the Excel functions described here make a critical assumption regarding the process used to select the sample: they assume that the sample drawn was drawn at random, so in this case, every household would have the same likelihood (probability) of being selected.
When describing the data in a set, each member of the set is called an element. So if you’re describing customers, each customer is an element. The characteristics of interest in the elements are called variables. So if you’re looking at annual income, age, and sales, these would be your variables. The experimenter manipulates the independent variable and measures the dependent variable after the manipulation to see whether it experienced any effects. A random variable describes the outcome of an experiment numerically. It can take on different values or ranges with certain probabilities. The collective group of measurements obtained for an element is called an observation.
The term probability refers to the likelihood that an event will happen. Probabilities range between 0 (impossible) and 1 (inevitable). A probability distribution graphically depicts how probabilities are distributed over discrete values or ranges of the random variable. Probability distributions can take on several shapes. For example, a uniform probability distribution is rectangular—it occurs when there’s an equal probability for every value of the random variable. Another common probability distribution is the normal or bell curve. This occurs when there’s a relatively high probability of a random variable taking a certain value or range and a symmetrically diminishing probability as you move away from this value.
A discrete variable is one that can’t fall to an infinite number of digits. For example, the number of children in a family is a discrete number, in this case a non-negative integer. A continuous variable, on the other hand, can take on a value with any number of digits. For example, you can theoretically calculate the time it takes a person to run a mile down to the smallest fraction of a second. The probability, therefore, of a continuous random variable taking a particular value is zero. Note that statistics calculated from discrete variables are continuous variables. So you can say that the average number of children in a family is, for example, 2.3, although no family could have 2.3 children. An event is a collection of outcomes that share a condition. For example, you could call all outcomes in which a project goes over budget or in which a lot of goods is rejected an event.
Leave a Reply