The previous poast in this series was a reproduction and robustness check of the self-proclaimed scientific refutation of the Oral Polio Vaccine hypothesis, the claim that major strains of extant HIV resulted from human-administered vaccines contaminated with infected chimpanzee tissue. The data analysis component of Michael Worobey’s paper seems to check out!
This installment will deal with the bad aspects of this publication: data sparsity, logical stretches, overconfidence in the conclusions.
In the interests of clarity and disclosure, I’ll state now that I don’t think the OPV hypothesis is likely to be correct, but I don’t believe it’s been completely refuted, and Worobey’s work doesn’t move the needle to this end. Yes, the fieldwork Worobey’s team undertook was impressive and the phylogenetic analysis stands, but a swallow does not make a summer. Like his conclusions concerning the origins of the COVID-19 pandemic, we should not be taking Michael Worobey’s reasoning or pronouncements on this subject seriously.
The Bad
The bad parts are data sparsity and geographic fallacies, delineated below.
As described in Part I, The Good, Worobey mounted an expedition into the jungle near Kisangani which is a city in a central region of the Democratic Republic of Congo. It is here where Hilary Koprowski based his oral vaccine trials from 1957 to 1960. Worobey’s team collected nearly 100 samples of chimpanzee feces in order to test them for Simian Immunodeficiency Virus (SIV) precursors to HIV-1. Out of these samples, they found a single sample from which they were able to extract ~11% of an SIV genome for comparison to an assembled SIV and HIV sequence panel. He calls this SIV genome SIVcpzDRC1.
Sparsity
This is going to be a short section. Yes, the fundamental data element is a scrap of a single SIV genome. Compare this to Santiago et al.’s work where they based their findings on genetics from five separate SIV samples. Compare this to the ancient DNA research pioneered by Nick Patterson and David Reich. Even with a dozen or so ancient samples, many of them full human genomes, these investigators are much more cautious about what they claim. It is not normal to “refute” stuff in modern science based on such sparse data, and when it is done, the underlying reasoning is ironclad.
Any investigator would be on their guard about such strong claims coming from a single data point.
Geographical closeness
The argument presented tacitly in Worobey et al. is at its heart a geographical one, and flows as follows.
Chimpanzees living closer together are going to interact with each other and spread communicable diseases among themselves more than chimpanzees living further apart from each other, and likewise, their descendants after 50 years.
Given 1, SIV viral samples collected in the bush physically closer together are going to be more related to each other than samples collected further apart from each other, and likewise, their viral ancestors.
Chimpanzees living near Koprowski’s lab in Kisangani were more likely to be captured by humans in Kisangani than chimpanzees further away from Kisangani. 1, 2, and 3 are all applications of Tobler’s law of geography: “Everything is related to everything else, but near things are more related than distant things.”
Since SIVcpzDRC1 was found close to Kisangani but clusters away from known HIV-1 strains phylogenetically, chimpanzees near Kisangani are unlikely to harbor SIV progenitors of HIV-1, given 2.
Given 3 and 4, Koprowski’s Kisangani lab was unlikely to be the source of HIV.
Completely straightforward, right?
To spell this out, I’m actually pretty on board with 1 and 2. Even 3 is a plausible, but potentially incorrect. However, jumping from 2 to 4 is a stretch and 3 + 4 does not equal 5.
OK. The basics. Where did Worobey et al. get their sample? It’s sort of hard to tell. They report
From these, we identified one SIVcpz vRNA-positive specimen from the Parisi forest by PCR amplification of gag (422 base pairs) and gp41/nef (699 base pairs) sequences.
but they don’t give distances or an exact location. They supply Figure 1 as a map in the supplementary information.
“Parisi Forest” doesn’t exist as far as Google Earth is concerned. This is not suspicious. This entire region is dense sub-saharan rainforest. You can find Wanie-Rukula on Google Earth which is 52 kilometers from central Kisangani and keep drawing a line. From what I can tell, it’s about 130 kilometers south east of central Kisangani. See Figure 2.

I don’t think Parisi is much further than 150 kilometers from Kisangani because then you’re getting close to the Maiko National Park, and that would probably be reported as such. I’ve shared this Google Earth project here if you want to check it out.
Worobey states
Phylogenetic analysis of the newly derived sequences revealed that the Kisangani virus clustered with high statistical support with SIVcpz strains that were infecting chimpanzees of the same subspecies (Pan troglodytes schweinfurthii) that lived about 800 km to the south-east in Gombe National Park in Tanzania.
Gombe National Park is in the green circle in Figure 2. It’s only 760 km from Kisangani itself, and so it’s definitely much less than 800 km from the Parisi forest in the orange circle. It’s more like 630 km from Parisi.
Worobey is trying to call a virus he discovered 130 km from Kisangani “the Kisangani virus”. Driving north from Central park will get you to the Catskills in less than 130 km. Would you feel comfortable calling a virus you discovered in the Catskills “The New York City” virus? Would you more feel comfortable doing so if there were not really any good roads from New York City to the Catskills and it was a lot of dense jungle? Would you call it “The New York City virus” only if you were trying to make a tendentious point?
To add more context to this, the very paper that Worobey et al. cite in the quotation above is Santiago et al. They state
Kibale chimpanzees may have never been exposed to SIVcpz because of biogeographic divisions of the eastern chimpanzee population. Kibale National Park is approximately 500 km north of Gombe National Park (Fig. 1A). Intervening lacustrine and forest gap barriers may have limited gene flow between these populations in the past (19, 20). It is possible that SIVcpz infection may not have spread to Kibale if it was introduced into eastern chimpanzee populations subsequent to the establishment of such gene flow barriers.
and
These results indicate that SIVcpz is unevenly distributed among P. t. schweinfurthii in east Africa, with foci or “hot spots” of SIVcpz endemicity in some communities and rare or absent infection in others.
So, there isn’t uniformity in SIV prevalence or genetics across hundreds or even across tens of kilometers in central Africa. The 130 km separating Koprowski’s erstwhile lab and Worobey’s sample in the Parisi Forest is not meaningfully small when trying to use a “closeness” assumption like premise 2 above, and his colleagues basically say as much.
But let’s assume such geographic uniformity existed and think about this a different way. The closest relative that Worobey finds to SIVcpzDRC1 is Santiago’s SIVcpzTAN3; in my alignment these samples have 36.1% ungapped nucleotide divergence (252 positions). The nef gene which these sequences describe apparently has a very high rate of evolution; most people seem to think SIV and HIV-1 evolve at similar rates and this 2016 source thinks the nef rate of evolution is between 0.01 and .015 substitutions per nucleotide per year. The back of my envelope shows SIVcpzDRC1 and SIVcpzTAN3, separated by 630 km, diverged between 252 / (2 * .01) = 12,600 and 252 / (2 * .015) = 8,400 years ago. The Parisi forest is 130 km from Kisangani, so using this geographical closeness argument, we could expect SIVcpzDRC1 and any of its nearby unsampled viral relatives to have diverged between (130 km / 630 km) * 12,600 = 2,600 and (130 km / 630 km ) * 8,400 = 1,733 years ago.
Obviously, these figures represent a divergence way, way before anyone believes HIV-1 spilled into humans.
The point is you need to be careful with Tobler’s law of geography, and these sorts of geographical arguments only make sense in the presence of some type of geographical uniformity in what you’re trying to study. Santiago, a co-author of the “refutation” paper, doesn’t even believe in such uniformity in the case of these SIV viruses.
Geographical non-closeness
The above analysis, and all the described weaknesses, gives Worobey the benefit of the doubt and assumes premise 3 (Chimpanzees living near Koprowski’s lab in Kisangani were more likely to be captured by humans in Kisangani than chimpanzees further away from Kisangani) is correct. But what if that’s not correct?
The Congo river is a major trade route, and it has been for centuries. Myriad traded goods have moved between Kinshasa and Kisangani since the days of Belgian colonialism and of course that continued well into the 20th century. A huge portion of all human activity is moving valuable things from less accessible places to more accessible places. Worobey et al. had to travel 130 km from Kisangani into the jungle to find any useful chimpanzee feces. Why are we to believe it was easier for chimpanzee hunters in the 1950s to capture chimpanzees for laboratory work in the jungle near Kisangani than it was for Worobey et al. to find chimpanzee shit near there? If there was an easier way to get chimps elsewhere on the Congo river, if the chimps were needed in Kisangani, they would have gotten there from wherever that elsewhere was.
An irony here is that one of the main arguments that pro-zoonosis scientists use to explain why HIV-1 spread and became pandemic only in the early 20th century was explosions in trade and urbanization in the DRC. If there was enough travel and trade to support an HIV-1 pandemic that hadn’t occurred for literally thousands of years, surely there could have been enough trade to bring infected chimpanzee from Cameroon or somewhere in the west up the Congo river to Kisangani for scientific research. Edward Hooper wrote in 2008 that
The precise source of the chimpanzees that bore the SIV (or SIVs) that crossed into humans to make HIV-1 is still unproven. Ms Lapidos asserts that they “probably came from Cameroon”, but the truth is that we still don’t know. However, there is documentary evidence that at least one Pan troglodytes troglodytes (Ptt) chimpanzee from the west central African region that includes Cameroon was present among Koprowski’s chimps in the Congo.
… Because these chimps were co-caged and group-caged together, an SIV introduced by one single chimpanzee could have infected many others in the camp.
I’m not saying I believe the documentary evidence he refers to, but movement of goods up a large trading river route, especially when white westerners need them in a colonial-era lab trying to do critical vaccination work, is not a wild-eyed scenario, and Hooper is right to point this out.
Coda
Like the geographical work using KDE analysis in Wuhan to supposedly identify the wet market origin of the COVID-19 pandemic, Worobey et al. 2004 uses similar fallacies (ignoring large scale human movement and dubious geographic principles) to advance a specious and overconfident line of reasoning. The OPV hypothesis is a falsifiable proposition; for example, if a human HIV isolate was discovered before 1957, or even if there was epidemiological evidence of AIDS in central Africa before then, the theory would no longer be tenable.
Part III of this series will be entitled “Going Apeshit, The Ugly”. Stay tuned.