In my first (found here) of a three part series on statistical analysis, I discussed how data can inform decisions within countless industries. My research in human capital, and specifically education, provides for various usages of data in decision making. Specifically, available data can be used for predictions of when, where, and which students may have trouble, determinants of parental satisfaction, admission decisions (at both secondary and collegiate levels), and student achievement both academically, on standardized tests, and in future wages.
This post is intended to shed light on the science of data analysis, specifically conditional expectations. The mathematical approach to conditional expectation is based on heavy statistical concepts, some of which are not appropriate for this venue. That being said, my intention is to provide an accessible explanation of the techniques used in statistical analysis. For a mathematical approach, I recommend the seminal text Econometric Analysis, by William Greene, 2002; or Econometric Analysis of Cross Section and Panel Data, by Jeffrey Wooldridge.
Econometrics, and essentially any data analysis, is based on determining a prediction. In statistical terms, this is called an expectation. A conditional expectation is a prediction based on available information.
For instance, consider guessing the height of a random human being. The average human height is 5 foot 6 inches, so, this would be a logical starting point for our prediction. But if we know more information about this random person, we can improve our expectation. Specifically, if we knew the person was male, we would want to change our expectation conditional on that fact. The average height of an adult man is 5 foot 9.5 inches, so that would be our new prediction given some information.
If we also knew that the person weighed 240 pounds, we may want to increase our expected height. Note here that there is no causal assumption, just a change in our expectation, given some piece of information. This is correlation: the taller a person is the more, on average, we expect that person to weigh. We may predict 6 foot 1 inch for our random male weighing 240 pounds. Data analysis can help us with our predictions, given some imperfect information.
Next, consider that we also knew this random person’s SAT score was 1200. We probably would not consider changing our expectation of height. If we had data on a sample of people with the variables height, sex, weight, and SAT score, some variables may be good predictors of height and be statistically significant while others, SAT score in this example, would not be statistically significant and would not persuade us to change our expectation.
Multiple Regression Analysis essentially considers a sample of data and determines the predictive success of each variable. Using the information from a statistical data program (even excel can do this reasonably well), we can arrive at a predictive equation for the most logical expectation conditional on the information we have at our disposal.
The data will find the coefficients- the “B’s”- and also determine the likelihood that each “B” is a significant predictor. In this case, I suspect B3 would not be statistically significant.
In part three, I will discuss an example of student achievement data from a nation-wide sample and how conditional expectations can be used to inform decisions in many fields.