The Great Mouse Detective: Return of the Variability Hypothesis
A mouse does not rely on just one hole.
In the last poast on the greater male variability hypothesis, we cast doubt on the idea that X-inactivation was a relevant explanatory factor for this hypothesis in mammals. Marsupials don’t seem to have any attenuation in greater male variability compared to placental mammals. Only in the latter mammals could the system of somatic X-inactivation plausibly account for it.
I proposed that females experiencing greater stabilizing selection than males in reproductive traits could form a competing theory. If female reproduction has a large enough biological footprint spanning enough biological pathways, outlying female phenotypes could be selectively penalized if they shared those reproductive pathways; females who express extremity might not be as successful at fault-intolerant tasks like bringing offspring to term or caring for them until they disperse.
Sometimes the most organized way to think about complicated hypotheses is in the language of causal graphs. In causal DAG terms, this idea looks like this.

There are some different ways this proposition could be tested.
In this explanation of the variability hypothesis, males express greater phenotypic variability because female reproduction is unforgiving, and males don’t have to do it.
…
Any test of this is indirect and difficult; you might look at male/female variability ratios in mammals having long pregnancies versus those having short ones—longer ones would predict more female stabilizing selection and higher variability ratios … Maybe we’ll do this in another poast.
That poast has arrived. This study required the assembly of a diverse set of data elements in order to perform testing like this, and it turned into a really surprising analysis. Without further delay…
Whence the data?
We’re going with the strategy described above to try measuring the differential complexity of mammalian pregnancies across species. We’ll need gestation time as a proxy for complexity at the very least. When I first started looking at the data, the investigation quickly took a turn when I realized the obvious fact that females could be affected by selection pressures acting after they gave birth as well, either weaning or caring for their offspring before they dispersed, and that would need to be accounted for. We’d like total time before an offspring tends to part with its parent(s) as an additional proxy. OpenAI’s Deep Research was actually amazing at finding sourced species-level data like this, so credit to them.
Under this testing strategy, the causal diagram of figure 1 gets extended, as in figure 2.
If it’s the case that stabilizing selection in reproductive traits and long gestation/rearing are co-caused by complicated female reproduction, their co-observation across species provides evidence the thing causing both actually exists, which is what we want to prove.
Just like in the previous work looking at marsupials, we need more mammalian morphological data broken down by sex to measure differential variability ratios—the male variance divided by the female variance. Behavioral data would also work, but it’s harder to collect, rarer, and more subject to measurement issues. The data doesn’t need to be at the measured individual level, and can be aggregated by trait or phenotype provided the variance by sex is appropriately captured. Deep Research was not so great at this, and this required a high human touch to collect. The work of doing obsessive and meticulous data foraging isn’t being replaced by AI yet.
For this analysis, it’s not enough to just pick a bunch of mammalian species for which suitable data exists. We must be judicious. Gestational times in mammals vary across an order of magnitude; mammals as a whole face a ton of diverse selection pressures that could confound the analysis. We need a single, reasonably diversified clade. They should be cosmopolitan, but face similar lifetime strategies and niches—no mixing predators and prey, for example. They should be well-studied and characterized by humans. They should have limited or preferably zero male parental investment that could also confound the analysis.
In other words, we’re going to be studying rodents.

The assembled data elements are from black and brown rats, Mongolian hamsters and Mongolian gerbils, golden hamsters, Ansell’s mole rats, guinea pigs, nutria, chinchillas, black-tailed prairie dogs, and of course, the house mouse. The data ranges from cranial measurements to neurological measurements to mass and length measurements to measurements of internal organs. One of the most striking things about this investigation is the shocking regularity of greater male variability regardless of the traits that are actually being measured. If you are a rodent guru and have morphological, sex-disaggregated rodent data you’d like to add to this analysis be sure to DM me.



Scatterplots
Just like with the possums and Tasmanian devils in the last poast, each of the species above will get a plot. Each plot will have a collection of dots. Each dot will be a trait measured in a study over a collection of animals, both male and female. In that trait, we plot the male/female Cohen’s D effect size versus the male variance / the female variance in that trait. The effect size quantifies how different the male and female means are. The variability ratio quantifies how different the male and female variances are.
We can then use the intercept of the best fit line b for these points at d = 0 to get an appropriate aggregated prediction for each species’ greater male variability.
Some of the plots look really promising and belay a certain amount of regularity, especially in the variability ratio estimate b. A lot of these values are right around 1.3, which is comparable to analogous aggregated statistics in humans. The code and data for this analysis is checked in here.
Some of them have only a few traits being measured at all. It seems in poor form to pretend like I carefully assembled a ton of data for all of these rodent species.
And a few are divergent and unexpected.
Analysis
The provisional conclusions from these data are as follows:
The relationship between the effect sizes and the variability ratios is weak, and unlike in humans, on net it’s probably close to zero in rodents.
Some rodents like rats, chinchilla, and hamsters show typical patterns of increased male variability.
Some rodents like nutria, guinea pigs, mice don’t.
There’s in principle enough variation created between rodent species to look at whether gestational times predict variability ratios. I could plot the gestational times versus the variability ratios, but at first glance this idea is dead on arrival. As you can see above, the species with the lowest variability ratios are mice, nutria, and guinea pigs. The former have the lowest gestational times of everything I looked at and the latter two have the longest. I didn’t actually do it, but I’d bet this is true even if you adjusted the gestation time for the animals’ mass.
What’s going on here?
One interesting fact about the lifecycles of guinea pigs is that often when they die, they’re quickly reincarnated as a younger guinea pig that looks a lot like they did. This is especially true when they’re caged as the pets of small children.
A second, lesser-known, interesting fact about the lifecycles of guinea pigs is that unlike most rodents they give birth to precocial young. These are births when the young have a substantial amount of autonomy from their parents—they aren’t completely bald, blind, and helpless. Nutria offspring are also precocial.
I’m not sure what’s going on with mice. I double checked and the data above isn’t incorrect. There does appear to be a consensus that they exhibit little sexual dimorphism, but it does appear that the rodents with the longer more complicated pregnancies capable of producing developed capable young are the same ones than exhibit attenuated relative male variability. This precisely the opposite of what the stabilizing selection explanation predicts.
And it’s actually even worse than that. I mentioned before that gestational time might be insufficient to capture all the selective pressures that could be occurring; mothers could be differentially good at taking care of their young after they’re born. However, you can’t even say “well actually, mothers of precocial young must be selected to do a good job after birth” because by definition precocial young spend less time with their parents before they disperse.
We’re going to do some self-pitying, post-hoc rationalization here after apparently being completely wrong. Whether or not they’re rodents, precocial young themselves might experience more intense stabilizing selection than altricial young; their fitness guarding against predation and just general incompetence in survival needs to be there immediately, and they have less parental support to fall back on than altricial young have. There is an interesting lesson to be learned here when thinking about natural selection more generally: selection pressures in young offspring can be different and opposed to those in their mature parents, even when the young offspring eventually turn into mature parents themselves. This is called antagonistic pleiotropy, though our particular example has nothing to do with George Williams’ likely incorrect theory of senescence. You can’t simply look at data like above and expect the test you’re doing to be fruitful when selection pressures on parents are potentially mixed with selection pressures on their young. The causal graph here might look something terrible like this.
A digression about lemmings
Lemmings are the most interesting rodents in the world. In several different species including the wood lemming and the varying lemming, a selfish genetic element inverts and rearranges part their X chromosome. A normal X chromosome is called X, but this selfish, inverted X chromosome we’ll call X*. Through a mostly unknown biophysical mechanism, if an individual inherits an X* from its mother and a Y from its father, the X*Y lemming will physiologically development into a fertile female instead of a male. There are therefore 3 viable female morphs : XX, X*X, and X*Y. YY zygotes are not viable, so half the Y bearing zygotes die, 2/3rds of the survivors carry the selfish X* instead of the usual 50%, and X* preferentially spreads. Neat!
In the wood lemming, X*Y females produce no Y eggs at all, and 100% of her offspring carry X* and develop into females. Even neater!
The drive of this selfish X* produces absurd sex imbalances, which periodically threaten the survival of the species itself. There’s even some evidence that the lemming Y chromosome has evolved its own counter-drive to preferentially produce males in the presence of X*. Lemmings can go through massive boom and bust cycles in population sizes which are plausibly connected to their bizarre sex determination system. Think of a bizarre, slightly differently specified Lotka-Volterra system of equations. So where do they fit in?
Well, lemming offspring are not precocial like guinea pigs or nutria but simultaneously there’s not much evidence of any morphological dimorphism like in other rodents. Even the different female morphs are difficult to distinguish, though Saunders thinks X*Y might be a bit more aggressive in nesting behavior. The high level picture is very confusing, and how lemmings fit into it is even more so.
Overall, I’m a bit stymied. If you had asked me 6 months ago, I would have said X-inactivation or more intense stabilizing selection in females were the two most promising ideas pertinent to resolving the variability hypothesis. I now don’t believe either of those explanations are tenable.