Last time, we looked at four insights from the NYU data breach that occurred on March 22nd. These were just some descriptive graphs and statistics, but for most of this installment we’re going to use a little regression analysis and figure out what’s actually important in being admitted into NYU as an undergrad.
The backbone of this poast will be the regression model that supports this application where you can toggle through various student attributes NYU might use to judge undergraduate candidates and see how likely such a student would be admitted. Keep in mind that all the data supporting this model is information that NYU itself maintains. I didn’t use anything apart from the data that were leaked.
For the statistically inclined, the model looks like this.
We’re using all of these variables: school type, ethnicity, standardized test scores, GPA and class rank to predict the likelihood of a student being admitted to NYU.
V. Does the type of high school you went to matter?
Before college, I was a product of 13 years of independent private education. I’ve always sort of expected my parents wasted their money, but it was theirs to waste. If their only goal was to get me into a selective college, did it matter?
Maybe this is a feature of NYU’s richie-rich reputation, but it looks like they didn’t waste their money. Row 4 in Figure 1 shows that going to an independent (but not religious) school is the only significant-school related predictor of admission to NYU. Comparing the coefficients for tests scores versus school type, my parents would have been better investing in experimental tech to improve my SAT scores, but that’s for another poast.
VI. The admission officers have calipers
In this dataset, when you put the various tables together, unsurprisingly, some of the fields have missing values. There’s a lot of students with no standardized test information; many, many of the applications don’t have even basic stuff like GPA or what school you went to. OK, but every single applicant over the last ten years has racial and ethnic information. The only way I can imagine this happening is if racial and ethnic information is a required field on every type of application NYU supports.
The PERSON table recording basic stuff like name, address etc for applicants has 78 fields. 14 of these fields record racial or ethnic data, often redundantly. Count them yourself in Figure 2.
VII. How large a factor is race in admissions likelihood?
It’s large. There’s no other way to put it. Three of the Racial and ethnic categorical regressors are statistically significant; being Asian significantly and negatively affects the likelihood of admission and being Black or Hispanic significantly and positively affects the likelihood. For the Asian, Black, and Hispanic racial categories the average marginal effects on the likelihood of admission where -0.032, 0.233, and 0.176, respectively. This is not controversial and is broadly consistent with Harvard’s expert witness David Card’s analysis in the Fair Admissions vs. Harvard supreme court case. His findings for Asians are non-significant, and mine are I guess?
What does it all mean? I’m leaving that up to the real lawyers to decide. If you want to get the intuition behind this, go use the app linked to above.
VIII. Do extracurriculars matter?
The analysis I wanted to do on this was to see which extracurriculars might be the most important using some cool natural language processing on the extracurricular descriptions. I instead looked at a more elementary question: regardless of what they are, do more activities like math club, model UN, and building homes for the impoverished matter at all in admission?
Broadly speaking, no. The last row in Figure 1 is simply the number of described extracurriculars in a given application and astonishingly, the more stuff you put on your application, the less likely you’ll get it, and this effect is statistically significant.
IX. So the regression indicates the more stuff you do the worse your chance of being admitted are. OK, so what are these activities that are harming you?
They look absolutely as joyless and mundane as you might expect and I definitely plan on bringing these findings up when my wife encourages our son to “participate” in an “organized” “community” “program” “etc” during “10th grade”.
Please. If you want your kid to join the ranks of entitled plutocrat children graduating from NYU, take it easy, cut the extras out, and let them run around outside for christ’s sake.
X. But your grades? Surely they matter?
I tried to normalize everyone’s reported GPA on a 0 to 1.5 scale like you see in the app. Back when I went to school, GPA’s were out of 4, but then I noticed a ton of kids with GPAs above 4, and had no idea what that meant. I’m not the only one who must think that’s weird, since a student’s grades were only marginally significant in the regression (p = .037). At first I thought I had messed something with the normalization up, but I also included the sort of pre-normalized measure of class rank in the regression and that also wasn’t very significant. NYU was always proud to have made standardized testing optional early on, but they still trust the testing much more than reported grades in deciding whom to admit.
XI. You already did the job titles of the parents, but where do they work?
The apparent representation of the upper parts of the professional managerial class in the NYU matriculant pool astounded me. We also know where the parents work. Is this anything to be learned there?
This one’s a bit more subtle than the parents’ occupational analysis in the first part. A few individual companies make it into the cloud like Morgan Stanley, Wells Fargo, Citigroup, IBM, and NYU itself, of course. I’d say the flavor here is international: “international” itself does appear, the UN, but the most interesting is the east asian tokens: “Shanghai”, “Hong Kong”, “Beijing”, “China”. No other geographic region features as prominently in this graphic.
XII. How has the footprint of foreign students changed over time?
I want to be circumspect here, because some of the data is incomplete. You can break the applicants’ origins by what appears to be completely arbitrary “region” field. You can figure out how many students matriculate by region for many of the recorded years.
My guess is that this field was deprecated somehow, but it does indicate a stunning increase in matriculants from the UAE petrostate awash in corruption and with a questionable human rights record region. Maybe someone can take a closer look at what might have happened here?
XIII. Call to action
I’m going to leave this section just as a call to action. If you have any other questions from this data you think I can answer, leave it as a comment below, and I’ll look into it.