Bringing Data Together

As mentioned in a previous post, I am in the process of updating the mathematical model used by my school to determine when students are ready to take college-level courses. This model is important to us because we send over a third of our juniors and half of our seniors to college each year and we don’t want to mistakenly send students to college before they are ready. Using this model, my team has gotten pretty good at determining readiness; last year our students passed 97% of the college courses they attempted.

Before the model can be applied, it must first be brought together into a single database or spreadsheet. Depending on your systems, this can be a quick or timely endeavor. For me, bringing together all of the data we have on students took a little over six hours. Here’s what I did:

Google Sheets

Because it is shareable and applies edits in real-time, I do all of my modeling in a single Google Sheet. For anyone who is an Excel devotee, this may sound crazy. It is. But, for me, the benefits outweigh the costs.

For this year’s update, I created a new Google Sheet called “Master Data File” where I pasted an export from our Student Information System (SIS) containing each student’s name, ID, DOB, sex, graduation year, and commutative GPA. Because our SIS contains the most up-to-date information regarding student enrollments, I always start there and then use that data as reference for gathering the rest. No need to gather data on a student no longer enrolled.

Microsoft Excel

So far, there is only one function I need that is not easily done in Google Sheets: consolidating data. At one time, I would spend hours manually inputting data from one system’s export file to another. Excel can consolidate data from two spreadsheets in minutes.

The Consolidate function is in the "Data" ribbon on Microsoft Excel.

The Consolidate function is in the "Data" ribbon on Microsoft Excel.

For example, data downloaded from the College Board website looks different than data taken from our SIS. The College Board data includes some students who have left my school, is missing data for students who are newly enrolled, and may have other formatting differences that would make a simple copy/paste impossible to do.

As long as I have a single column that uniquely identifies individual student (student ID, “Last Name, First Name” combinations, etc.), Excel can consolidate the data from both sources into a single row to be included in the master file.

Data Brought Together

Here’s the data I consolidated into the single Google Sheet for each student organized by source:

Student information System

  • Demographic Information used for sorting and aggregated data analysis
  • High School Grade Point Average: used as a primary indicator of future college success. This topic will be expanded upon further in a later post.

College Board

  • PSAT 8/9, 10, and 11: We give the PSAT to all students every year in grades 8 through 11. While we do not yet use this data in our model, I decided to pull it in hopes of future analysis and reporting.
  • SAT: In Michigan, all 11 graders are required to take the new SAT. Our community college partner accepts SAT scores for determining college course placement, so we use these scores as part of our readiness model.
  • Accuplacer: While this is technically a College Board product, we get this data from our college partner. Our students take this college placement assessment each year until they place into college-level coursework beginning in the 9th grade.


  • ACT: Now that the state of Michigan has moved from ACT to the SAT for it’s college readiness assessment, we only have a few students each year who take this assessment. For those who do, though, I need to consider their scores when determining readiness.
  • Compass: Until this year, our college partner used the ACT’s Compass assessment for determining college placement. This assessment was replaced by Accuplacer but we still consider Compass data in determining students’ college readiness.


  • Agency Score: Each year, we ask our teachers to rate each student’s skill at exercising agency on a scale of 0-5. Agency, for those not familiar with the concept is one’s ability to be an “agent” of his or her own learning. It consists of two components, both a part of our instructional model: 1.) ability to complete tasks to specification and on time, and 2.) growing from challenging work and setbacks. I simply ask teachers to rate each student and take the average of their input. More on this measure of college readiness later.

When recording assessment data, I like to separate it by the year it was taken relative to the student. I like to know what each student’s score was each year they took it. This allows me to see growth or stagnation in student performance, and makes analysis and reporting of data much easier to do.

Next up: what I do with this data once I have it all in one location.

Modeling Future Student Success

Over the next few weeks, I will be updating the mathematical model I created to predict students' future success in college. That model, which my school has been using and revising for the past four years, looks for patterns in academic and behavioral data to help predict individual student's likelihood of earning passing scores in college coursework.

I created the model in response to learning that standardized test scores alone left far too many edge cases to accurately predict future academic success. Too many students had previously scored well on tests yet did poorly in college classes. Similarly, some students we thought could handle college coursework did not score well on traditional measures of college "readiness."

Using this model, my school sends a third of its juniors and half of its seniors to college. Last year, these students passed 97% of the courses attempted. Ninety-three percent passed with a C or better.

To learn more about my school and why we send so many students to college while still in high school, I recommend reading my post from June titled Early College For All.

There is nothing magical about the model. It simply applies what is already known about past students' success to predict how well current students might do in college coursework.

The model uses three primary sources of data:

  1. Standardized college placement or college readiness scores: I have used data from different assessments over the years with relatively similar results (Compass, Accuplacer, ACT, and SAT).
  2. High school grade point average: in my school, the strongest predictor of future academic success is past student success.
  3. Teachers' subjective assessment of student "agency:" Each winter, I ask my faculty to evaluate each student on how well they are perceived to grow through challenging work and complete work on time.

Each year, the weight applied to each of these data sources has changed to reflect what we've learned about past student success. Last year, high school GPA and test scores were weighted about evenly. Agency, while found to be an accurate predictor, was weighted very little (approximately 10%) due to its subjective nature and the potential for perceived bias.

Over the coming weeks, as I update the model, I hope to share more of the details that go into its creation and revision. I see great value in having more schools analyzing data in this way and think it's a simple enough process that can be replicated with bit of time and effort.

Disclaimer: I am not a mathematician and do not claim to be an expert in inferential statistics. I am simply a practitioner with a good memory of his Statistics 101 class. I welcome any feedback from readers with stronger mathematical grounding.

If you have questions about this model that you would like me to expand upon or would simply like to learn more, feel free to leave a comment or reach out by email.