The following is the abstract of a recent paper I wrote analyzing student performance data. Here is a link to the entire paper.
Measurement of student achievement is at the heart of educational policy and standardized testing has been both supported and contested as a genuine representation of student achievement. This study shows that the interpretations of standardized testing may have starkly contrasting meanings for different cohorts of students. By using quantile regression to account for conditional PSAT scores, educational factors such as gender and tracking are shown to effect students in varying ways. For the students in this study, gender was a significant indicator of performance on the PSAT test, with being male accounting for more than a five point “bump.” The conclusion from the quantile regressions is that students with extreme PSAT scores are outliers based on ability or inability, not because of their gender. Meanwhile, students that participated in the “honors” tracking system had more of an increase in their predicted score the higher their conditional PSAT score.
In my first (found here) of a three part series on statistical analysis, I discussed how data can inform decisions within countless industries. My research in human capital, and specifically education, provides for various usages of data in decision making. Specifically, available data can be used for predictions of when, where, and which students may have trouble, determinants of parental satisfaction, admission decisions (at both secondary and collegiate levels), and student achievement both academically, on standardized tests, and in future wages.
This post is intended to shed light on the science of data analysis, specifically conditional expectations. The mathematical approach to conditional expectation is based on heavy statistical concepts, some of which are not appropriate for this venue. That being said, my intention is to provide an accessible explanation of the techniques used in statistical analysis. For a mathematical approach, I recommend the seminal text Econometric Analysis, by William Greene, 2002; or Econometric Analysis of Cross Section and Panel Data, by Jeffrey Wooldridge.
Econometrics, and essentially any data analysis, is based on determining a prediction. In statistical terms, this is called an expectation. A conditional expectation is a prediction based on available information.
For instance, consider guessing the height of a random human being. The average human height is 5 foot 6 inches, so, this would be a logical starting point for our prediction. But if we know more information about this random person, we can improve our expectation. Specifically, if we knew the person was male, we would want to change our expectation conditional on that fact. The average height of an adult man is 5 foot 9.5 inches, so that would be our new prediction given some information.
If we also knew that the person weighed 240 pounds, we may want to increase our expected height. Note here that there is no causal assumption, just a change in our expectation, given some piece of information. This is correlation: the taller a person is the more, on average, we expect that person to weigh. We may predict 6 foot 1 inch for our random male weighing 240 pounds. Data analysis can help us with our predictions, given some imperfect information.
Next, consider that we also knew this random person’s SAT score was 1200. We probably would not consider changing our expectation of height. If we had data on a sample of people with the variables height, sex, weight, and SAT score, some variables may be good predictors of height and be statistically significant while others, SAT score in this example, would not be statistically significant and would not persuade us to change our expectation.
Multiple Regression Analysis essentially considers a sample of data and determines the predictive success of each variable. Using the information from a statistical data program (even excel can do this reasonably well), we can arrive at a predictive equation for the most logical expectation conditional on the information we have at our disposal.
The data will find the coefficients- the “B’s”- and also determine the likelihood that each “B” is a significant predictor. In this case, I suspect B3 would not be statistically significant.
In part three, I will discuss an example of student achievement data from a nation-wide sample and how conditional expectations can be used to inform decisions in many fields.
This is the first of a three-part series of posts discussing the importance of data analysis and introducing readers to the importance and value in statistical and econometric analysis.
Last Sunday, Steve Lohr wrote a great piece in the New York Times explaining the importance of “big data” in today’s society. He explained that increasingly “businesses make sense of an explosion of data- Web traffic and social network comments, as well as software and sensors that monitor shipments, suppliers and customers- to guide decisions, trim costs and lift sales.”
The explosion of data Lohr is referring to is accessible to any field- not just a profit maximizing business- and when used properly, that data can be used to enhance and enrich any institution. Clearly businesses utilize data to inform their decisions, but increasingly, political campaigns, public health officials and advertising agencies are innovating their traditional practices by developing methods and metrics based on data analysis.
The most glorified example is illustrated in the book and recent film “Moneyball,” written by Michael Lewis describing the revolution in baseball by Billy Beane and the Oakland Athletics. The short story is that the team began to analyze players using complex statistical analyses instead of traditional benchmarks. Billy Beane is not the only front office executive to develop and exploit new statistical methods; the general manager of the Houston Rockets, Daryl Morey wrote a piece for Grantland.com regarding the “stats movement in sports” and how the success of Moneyball has transcended sports and become impacted countless industries.
Morey briefly describes how statistical analyses have entered the realm of education: the Gates foundation is gathering data to evaluate teachers. But Morey and the Gates foundation are only scratching the surface. Education at all levels is ripe for a takeover of objective data analyses. Statistics currently used within schools to evaluate programs or students rely on static data. Static data consists of the most basic statistics we remember from high school: averages and percents. New data- big data- is about how information, records, numbers move over time and how a fact or figure can be broken down to find relationships and meaning behind the numbers.
Consider a static piece of data such as: In a specific district, 28% of parents are unhappy about their child’s school. This does not tell an administrator much- probably only something that she already knows. But, a deeper look into the data could reveal important information such as “of the 28% of parents who are unhappy, 70% of their students play a varsity sport.” This has more value; specifically, there is a trend among unhappy parents.
The next two posts will dig deeper into the “why” and “how” of how data can be used to improve decision making. At the most basic level, data can help shape expectations, specifically conditional expectations given some sort of observed trend in the data. I will explain the concept of conditional expectation in part two.
With rumors about a possible second wild card swirling, it seems that the future of baseball’s post season will include more than eight teams. If each league adopts a second wild card, then one-third of Major League Baseball teams would make the payoffs.
With the TBS and FOX television deals, the league clearly makes plenty of money off of the post season and distributes it among the teams involved. I am sure that this data is easily available. But, what I wonder is what is the value of a post season appearance beyond the “winner’s purse?”
This past summer, I posed the question, What is the (economic) value of the Most Valuable Player? and I want to ask a similar question here. I know there is an explicit monetary award for post season appearances, but just as there is a trophy that goes to the MVP and ROY, what other benefits, both explicit and implicit, are awarded to these teams?
Specifically, free agents are more likely to sign with a contender, so a team like the Arizona Diamondbacks will benefit greatly from the playoff exposure. Certainly next season’s season ticket sales will be higher in places like Milwaukee and Detroit based on their performances. But how much more benefit does actually making the playoffs infer.
Consider a team such as the Cleveland Indians who garnered more respect around baseball and more support in Cleveland. Certainly their season ticket sales will be higher in 2012, just as it will for the Brewers and Tigers as mentioned above. There are econometric processes to tease that information out of the data. I will file this idea away until next season’s attendance data is available; when it is, this could make a fascinating study.
This spring I began researching a paper analyzing the effectiveness of the NBA draft. The paper is still in the process, but I want to share some of my findings and hopefully garner some responses/comments.
My thesis is that the NBA’s amateur draft is intended to allocate new talent to teams that need it most. If it is effective, then there would be some fluctuation of teams transforming from bad to good. Essentially, a cycle would turn bad teams into good teams by allowing them first crack at young talent. My first data analysis consisted of determine if there is a lag to each team’s success. I thought that if the draft was effective, teams that were bad in the past would be better 5 or so years later- once their prospects had developed. That is, the question I asked was: Were the teams that are winning now losers 5 years ago. Here is a table of winning percentages for the top four teams in each conference:
For the 2009-2010, there was a significant negative relationship between wins in 2006 and wins in 2010 (the 2010-2011 year had not finished when I last worked with the data). But, there was no correlation between wins in 2009 and a year 5 or so year in the past. Using a time-series analysis on team’s wins over the past 25 NBA seasons, I found that the only significant predictor of a team’s current number of wins was the previous year’s number of wins. That is, teams that are winning now were not statistically significantly worse 5 (or so) years in the past. My conclusion was that the NBA’s amateur draft does not turn losing teams into winning teams.
One additional conclusion I came to was that certain draft classes do matter. Specifically, I found that the draft classes of 1987, 1998, and 2003 turned losing teams into winners 7 years later. Each draft class turned some losing teams into significant winners. While only serving as anecdotal evidence, Lebron James turned the Cleveland Cavaliers from a 17 game winner in 2003 to 61 game winners in 2010 and from the draft class of 1998 both Dirk Nowitzki and Paul Pierce turned struggling teams into winners in the early 200s.
My next thought is to account for draft pick number rather than losses, since the lottery does not ensure a reverse order. There are some interesting studies on the NBA draft including “Losing to Win” by Beck Taylor, which analyzed if teams intentionally lose games in a “race-to-the-bottom” in order to arrive at the top pick.
Most analyses assumes that the NBA draft works, but my research points to an ineffective process of allocating rookie talent to the teams that need it most.