Thursday, 31 March 2011

Twisting ADMIXTURE's arm: ancient isolates as poles in Europe/ME

ADMIXTURE is an amazing program for ancestry analysis. The problem is, in unsupervised mode it picks stable old admixtures as "unadmixed" components -all populations are after all admixed if we dig far enough into the past.
It finds the Amerindian and European in a Mexican pretty easily, but if struggles to distinguish more ancient components, except if a population still corresponding mostly to that component is in the data. That's why we got some results at odds with recent research in unsupervised runs, such as an "Irula" South Asian component.

So how can we get it to uncover such ancient fossils?
Using the new supervised mode, I think my last analysis of Africans pointed out a method. Faced with a "childless" pole and an "orphan" component that doesn't exist as a modal component in any population in the data, ADMIXTURE tries to fit one to the other. It's algorithm presumably allows for the possibility that some of the variability of the "child" population is no longer present in the "parent" one. If we include poles such as forager populations that didn't contribute significantly to any other population in the data, ADMIXTURE will stretch that pole as much as it needs to include the "orphan". ADMIXTURE necessarily assumes that all variability in the analysed populations is accounted for by variability it's programmed to presume was present, but isn't actually represented in the poles.

This is how in the African analysis, West Africans-Bantus came to dominate "!Kung", even though "San", present in Xhosa and Tswane, was kept local.
Thus it occurred to me that this is a great method to "fish out" components for whom no parent unadmixed population nor anything close to it survives.

I set a run with the following poles, all known relatively "unadmixed" populations or higly distinctive populations, with no known close relatives such as foragers, semi-foragers and recent former foragers:
1. San (African foragers)
2. Papuans+Melanesians (isolates may pick up "Out of Africa" distinctive oceanic migration)
3. Nganassan (Siberian)
4. Koryak (Siberian)
5. Chukchi (Siberian)
6.!kung (African foragers)
7. Maasai (this seemed reasonably unadmixed in the African run, and I suspected some amalgamation there)
8. Yoruba (representative of WAF Neolithic)
9. Pygmyes (all) (African foragers)
10. Hadza (African foragers)
11. Evenki+ (similar) Yakut and Dolgans (Siberian)

I used Dienekes' run to pick Siberian populations. I realise now I amalgamated some as I used them before in some more localized runs, but shouldn't matter.
I did not pick any Fertile Crescent populations purposefully, as I wanted to see if ADMIXTURE could discover it by itself. I also analysed in the same run some African and Siberian populations as a sort of control.

I divided the results in several tables but it's all from the same analysis. Sorry for "San" and "Evenki" being the same colour don't know why google docs is doing this.

I'll offer an interpretation later, and rename the components. I intend to use this method in other regions as well, and if possible with the limited data available to me, design a run with a "master solution" for all populations together.

I'll also present collections of individuals from each population, to show that all significant components are not "chunky" and shouldn't be artefacts.


  1. Interesting work! The Papuan+Melanesian signal in the Buryat, Tuvan and Mongolian samples makes me wonder about Denisovan contribution to those populations.

  2. Well, these populations are indeed close to the Altai where Denisovans were found. But they also share other affinities with more Southern populations, not found in the other Siberians sampled. We'll see if it holds in a wider East Eurasian analysis later.

  3. Diogenes,

    Did you remove the Bantu admixed !kung samples? There are 2 or 3 out of the bunch who have clear recent Bantu ancestry.

    It is best to remove these and work with the 'pure' Khoisan.

  4. I didn't, but it doesn't matter much for my approach, I think, since obviously the "!kung" populations in each run are not related to !kung. It is just a device to get admixture to fish out Neolithic groups at a diversity level compared to that between forager groups. Like setting it up as a kind of "time machine" to forager-level differences and not to much more similar groups within the same Neolithic wave.