Twisting ADMIXTURE's arm: ancient isolates as poles in Europe/ME

ADMIXTURE is an amazing program for ancestry analysis. The problem is, in unsupervised mode it picks stable old admixtures as "unadmixed" components -all populations are after all admixed if we dig far enough into the past.
It finds the Amerindian and European in a Mexican pretty easily, but if struggles to distinguish more ancient components, except if a population still corresponding mostly to that component is in the data. That's why we got some results at odds with recent research in unsupervised runs, such as an "Irula" South Asian component.

So how can we get it to uncover such ancient fossils?
Using the new supervised mode, I think my last analysis of Africans pointed out a method. Faced with a "childless" pole and an "orphan" component that doesn't exist as a modal component in any population in the data, ADMIXTURE tries to fit one to the other. It's algorithm presumably allows for the possibility that some of the variability of the "child" population is no longer present in the "parent" one. If we include poles such as forager populations that didn't contribute significantly to any other population in the data, ADMIXTURE will stretch that pole as much as it needs to include the "orphan". ADMIXTURE necessarily assumes that all variability in the analysed populations is accounted for by variability it's programmed to presume was present, but isn't actually represented in the poles.

This is how in the African analysis, West Africans-Bantus came to dominate "!Kung", even though "San", present in Xhosa and Tswane, was kept local.
Thus it occurred to me that this is a great method to "fish out" components for whom no parent unadmixed population nor anything close to it survives.

I set a run with the following poles, all known relatively "unadmixed" populations or higly distinctive populations, with no known close relatives such as foragers, semi-foragers and recent former foragers:
1. San (African foragers)
2. Papuans+Melanesians (isolates may pick up "Out of Africa" distinctive oceanic migration)
3. Nganassan (Siberian)
4. Koryak (Siberian)
5. Chukchi (Siberian)
6.!kung (African foragers)
7. Maasai (this seemed reasonably unadmixed in the African run, and I suspected some amalgamation there)
8. Yoruba (representative of WAF Neolithic)
9. Pygmyes (all) (African foragers)
10. Hadza (African foragers)
11. Evenki+ (similar) Yakut and Dolgans (Siberian)

I used Dienekes' run to pick Siberian populations. I realise now I amalgamated some as I used them before in some more localized runs, but shouldn't matter.
I did not pick any Fertile Crescent populations purposefully, as I wanted to see if ADMIXTURE could discover it by itself. I also analysed in the same run some African and Siberian populations as a sort of control.

I divided the results in several tables but it's all from the same analysis. Sorry for "San" and "Evenki" being the same colour don't know why google docs is doing this.

I'll offer an interpretation later, and rename the components. I intend to use this method in other regions as well, and if possible with the limited data available to me, design a run with a "master solution" for all populations together.

I'll also present collections of individuals from each population, to show that all significant components are not "chunky" and shouldn't be artefacts.

Fertile Crescent impact in Africa (plus East African looses quotation marks)

As a method to estimate Fertile Crescent admixture in SubSaharan Africa, as well as uncover any recent affinities between West African and East African populations, I set up a run including all populations Africa I have available right now, plus some Middle Eastern ones.
As poles I designed FC + all current forager populations available:
1. Mbuti Pygmies (from the Congo region)
2. Biaka Pygmyes (from the Congo region)
3. !Kung (from Southwest Africa)
4. San (from far Southwest Africa)
5. Hadza (from Tanzania)
6. Turks+Druze

As for the other populations analysed. Bantu=Niger-Congo B
Maasai (Kenya/Tanzania, Nilo-Saharan speaking); Bantu Kenya; Luhya (Kenya, Bantu speaking); Alur (Uganda/East Congo, Nilo-Saharan); Hema (Uganda/East Congo, Nilo-Saharan or Bantu);
Fulani (West Africa, very dispersed group, Niger-Congo speaking); Bambaran (Mali, language of probable Niger-Congo A family); Bulala (Chad, Nilo-Saharan); Kaba (Chad/CAR, Nilo-Saharan); Dogon (Mali, Niger-Congo); Mandenka (West Africa, Niger-Congo); Brong (Ghana, Niger-Congo); Hausa (West Africa, Afro-Asiatic); Yoruba (West Africa, Niger-Congo); Igbo (West Africa), Niger-Congo); Mada (West Africa, Niger-Congo)
Bamoun (Cameroon, Niger-Congo); Fang (Cameroon/Congo, Bantu); Kongo (Congo/Angola, Bantu); Nguni (South Africa, Bantu); Pedi (South Africa, Bantu); Sotho/Tswana (South Africa, Bantu); Xhosa (South Africa, Bantu); BantuSouthAfrica

Please bear in mind, the results don't necessarily imply close affinity of any agricultural population with any particular group, Forager ancestors of current agriculturalists obviously don't exist as foragers anymore and likely were substantially different from current ones. This is just an experiment to check if unsupervised ADMIXTURE was hiding anything (i.e. old stabilized admixture events) behind East and West African modal components.

Even though Foragers likely have some admixture with agriculturalists, East African appears to be a distinct component. Yoruba have no visible Fertile crescent contribution, I will use them as well as a pole in the future again.

This was a highly experimental run, not sure if detailed interpretations should be made. Still I think ADMIXTURE is lumping together groups as well as it can, even if it has to put pole populations at extreme positions within the component based on them. So the !kung component is I believe more West African here, possibly stretched to include them themselves. The relationships are still valid though. And so is the conclsion, that as long as Hadza have no significant real Fertile Crescent admixture (as seems likely), there is little Fertile Crescent genetic influence in Subsaharan Africa except for Ethiopians, Maasai and Fulani.

Clarifying my Departing Point

I've started the current series of ADMIXTURE runs with an hypothesis I'm trying to test. I want to state it clearly and simply, because thats decisive in order for readers to understand the poles/experiments I'm guiding ADMIXTURE to do.

Here is the hypothesis put as a simple model:
1. The Neolithic Revolution was about a significant change in Humans. I happened due to selection over thousands of years in very few very special and uniquely rich and unstable environments around the world. This change influenced and was influenced mutually and gradually by technology and culture of the originally Forager people subjected to it.
2. Agricultural lifestyles allow for at least 10x higher population differences. Imagine for once this theoretical unreal situation: A land divided as a chess board into 10 squares; 5 inhabited by Neolithic people at 10ppl per square and 5 by Forager people at 1 person per square exclusively. Now imagine they mix. Total resulting population 10x5+5x1=55. Total resulting Forager contribution to gene pool: <10%. Once Neolithics dominate some regions, Foragers are a minority in their land even without wars and genocides (even though these last likely occurred).
3. Foragers don't invent or adapt to agricultural lifestyles because they do not possess such changes and can't develop them fast enough in most circumstances. They're predisposed to fight or flee instead. Their much lower densities make them prone to high-density disease they didn't evolve immunity to (but Neolithics did).
4. Foragers thus get swamped by agriculturalists, and populations become dominated by the Neolithic Core Area population. Since mostly men migrate, in ever larger numbers, almost all Y-DNA is Neolithic; much of the mt-DNA is forager (since early women were taken from forager populations, and later women descend from these); but autossomes are overwhelmingly Neolithic too.
5. Established Neolithic populations live at the Malthusian limit. Extra food means extra surviving children eating it.
6. The only major changes in such a setting involve invasion by populations with food producing advantages. Like better seeds, tools (techs); better organization for irrigation works (culture); major genetic advantages like better digestion of food products (lactose tolerance). Advantageous alleles still diffuse, but neutral genes remain overwhelmingly local.
7. "Invasions" and "Conquests" in such a setting are about militarized elites subjecting the peasants. Travelling is difficult and they are few compared to peasants. They live off rents collected from them, rather than join in the miserly peasant life. Since they despise peasants as serfs, they refer to themselves and their land by their minority identities. Thus we get "Roman" Tunisia; "Gothic" Ukraine; "Celtic" Anatolia, "British" Jamaica. This has no meaning as far as actual genetic constitution of the majority peasant population, but it's all contemporary authors talk about, as well as most contemporary luxury works.
8. If subjected populations live for long enough under the alien elite, they mix with it, appropriate their prestige tags, assimilate their prestige language with their substrate one, and much of material culture too. Thus French "Latins" with Celtic substrate, Bulgarian "Slavs", Anatolian "Turks", Egyptian "Arabs", etc. Elites do make a contribution on Y-DNA, since their societies transmit prestige patriarchally. But almost no mt-DNA. And little autossomical DNA, since elite Y-DNA bearers persist but successive wives are mostly local.
9. Slaves: slaves in settled agricultural societies do not have the impact they have in mostly unsettled Forager inhabited frontiers. Demographic success of agricultural slaves in the Americas is the exception not the rule for slave owning societies. Just like Agricultural minority success at Forager inhabited frontiers is the exception, and elite assimilation into settled populations the rule. African genes in the Americas expanded because they were able to join the early "Neolithic" gene pool there. They had high density disease immunity, agricultural knowledge, social and genetic adaptations, and the right crops versus Forager Amerindians in some regions. Slaves functioned as new rural "peasants" in untilled land from whom Barons extracted rents.
In Old World regions, where large peasant populations lived at the Malthusian limit, there was no advantage bringing slaves to till the land and replace the peasants. If a Baron could move the peasants from productive land so could he make them work as hard as slaves. Slaves were useful for house work, prestige, for city and mine labour. They had very severe social disadvantages. They generally died at far higher rates and reproduced a lot less than local peasants and so had to be continuously imported. They could not make large contributions to gene pools of very dense agricultural peoples, except if they carried food producing advantages, such as crops specially adapted to local circumstances. Slaves as a rule contribute a little mt-DNA, almost none Y-DNA and residual autossome DNA in such societies.
10. Over time, once a functioning advanced agricultural community is established, no matter how many elite invasions and conquests, or how many slaves brought; there might be small but significant changes to Y-DNA and mt-DNA respectively. But autossomes are likely to remain the same, with only residual contribution (except if food producing improvements are brought by migrants as said). At least until such times when machines till the soil, people don't live at the Malthusian limit, and they don't continue to reproduce even though food available.

Both presently, and in the Neolithic Revolutions, these assumptions don't hold. But in the intervening 10.000 years or so, they likely do.

"East African" components, what are they all about?

In unsupervised ADMIXTURE analysis, one usually gets a division in Africa between three major components:
1. A West African centered one (Yoruba, Mandinka)
2. An East African one (Maasai)
3. A Fertile Crescent one (most Egypt, part of Ethiopia)
Two of them derive from well known local wholly or at least mostly independent Neolithic components Revolutions: the West African one (with its own special tropic adapted plants) and the Fertile Crescent one.

The mystery is about the quite widespread "East African", Maasai centred one. Is it a third Neolithic Revolution (fourth with the Malagasy migration) affecting Africa? Is it simply an ancient stable mix of the former two, mistaken by ADMIXTURE for an "unadmixed" component?

I'll be running some experiments trying to find clues about this. Then I'll return to Europe and the Middle East and speculate on what exactly are those "Subsaharan African" segments about...

Debunking Mozabites

Mozabites in an unsupervised ADMIXTURE analysis are generally modal for their own component, which then dominates fellow North Africans, and is present in small amounts in some East Mediterranean populations with no likely contact with Northwest Africa (its presence in Iberians has likely other reasons explored later). Under a theory of Neolithic Replacement, this doesn't make sense, since North Africans are agricultural peoples since ancient times...
Thus I decided to try to break them up into Neolithic source components using the supervised mode in ADMIXTURE.

Three likely sources of Neolithic techs and genes are present in the region:

1. Northern Fertile Crescent: In order to distinguish a first wave from secondary waves from other regions in the Fertile Crescent area, I decided to use Basques as a pole. In retrospect maybe Basques absorbed some small but not insignificant secondary wave component too, but since it was likely much less important than in North Africa and the Middle East, the choice is still valid (may have to use populations from further North as a first wave pole next time though).

2. Northeast Africa: Egypt is obviously a major candidate for a secondary "Southern Fertile Crescent" wave. Their fast appearing shining civilization indicates extra surplus food production allowing exceptionally large elites. Surplus food techs would likely expand along with the people using them, since they would allow for higher densities of agriculturalists versus less advanced "first wave"-tech using peoples. However it's expected that secondary wave people (but not techs or seeds) will travel much less far than first wave ones, since already agricultural peoples will effectively resist and learn since they share a similar mindset already.
A complication to use Egypt as a pole is that Egypt very likely received high genetic flow from the Northern Fertile Crescent as it was perhaps a more peripheral but integral part of the West Eurasian Neolithic Core Area from the start of the Revolution. Thing is, Egyptians also have very significant non-Western Asian components they share with other East-Africans, namely Ethiopians and Maasai. Since Ethiopians also seem to have received major influence from the Arabian Peninsula in ancient times, a more southern pole must be sought to make things clearer with less overlapping. If North African populations have East African components, we can use Egyptians directly later, but in this analysis I chose the Maasai.

3. Western Africa: Western Africa's Neolithic Revolution seems to have happened later than the Fertile Crescent one (which failed to expand into the region due to seed-package maladaptation to tropical conditions). Still, there was likely much gene flow between both regions so I included them as the third pole (Yoruba+Mandenka).

Here are the results:

A possible interpretation: Mozabites may be a compound of an earlier Southeastern-subset Fertile Crescent Egyptian wave superimposed by a dominant more advanced Northern-subset Fertile Crescent one. As seeds, cultural practices and genes mixed in the Nile, a second (or third for North Africa) Fertile Crescent wave would expand and Egyptian Civilization would arise in the Nile itselfit.
It's interesting that populations from more arid areas have more "West African" (Mozabites themselves, Libyans, South Moroccans). It may seem more likely that the West-African minor component is derived from later expansion of West Africans and the caravan trade. However little is known, as far as I'm aware of local Forager genetics since no forager populations remain in North Africa today. Foragers likely remained in regions not congenial to agriculture until development of desert Pastoralist (camels, goats) lifestyles in Arabia much later. One tantalizing possibility is that it was more West African-like from the very beginning. The Saharan pump theory may offer some clues, since the Sahara during the Ice Age was much more congenial to (forager) gene flow from the south than today...

The model predicts this "Egyptian wave" would spread to Europe and the more distant Near East using the obvious expansion routes already travelled by first waves. It would however petter out relatively quickly as Egyptian colonists failed to completely overwhelm numerically already neolithic first wave peoples and would become more "first-wave-like" genetically the further one travels from Egypt. I will analyse European populations for evidence of Egyptian/East African admixture next.

A Third Eurasian Neolithic Revolution?

Recently there has been much talk in the ADMIXTURE-fiddling blogs about the origin of Ancestral North Indian (ANI) vs Ancestral South Indian (ASI). While ANI increasingly appears to be a Fertile Crescent derived population, ASI remains somewhat of a mystery, although some minor Yellow River-derived population may have also had a role.
Assuming an exceptionally fast and thourough cultural-biological Human Revolution followed by (near) complete replacement model, one would expect ASI to necessarily have undergone sufficient adaptations to be able to survive the onslaught of better adapted Fertile Crescenters and Yellow Riverers. It was necessarily more than just being at the right time and place to learn and get seeds from a neighbouring agricultural people. A whole new way of thinking and cooperating was likely needed... Since there is no archaeological evidence of an independent invention of agriculture in South/SouthEast Asia, and complete replacement from both known Core Areas is against the genetic evidence here, it would seem that the model fails to explain the facts and thus must discarded or much modified.

There are however some indications of a 3rd, perhaps less complete from an archaeological point of view, Neolithic Revolution in Eurasia.
Firstly, one other area, in addition to the Fertile Crescent and Yellow River areas, is thought to have independently invented quite advanced agriculture.
Secondly, several crops still grown in South Asia and South East Asia were not brought over from China or then Middle East by agriculturalists, and it's likely they were domesticated locally. It's quite rare for an already "Neolithic" society to domesticate new plants. We still rely on ancient ones, and the first independent domestication of a food plant in Europe was apparently the strawberry and that happened only in the 18th century... (rye was adapted in Northern Europe but it probably already grew as a contaminant of better cereals back in the Near East).
South Asia's crops are also less productive crops that no farmer already possessing the likes of wheat, barley or rice would care about labouriously domesticating.
One, sugar cane is well known today, as it's grown as a commodity crop. Interestingly also grown in New Guinea (even though mostly different varieties).
Others are maybe forgotten, since they compared unfavourably with the new crops.

Razib recently unveiled a 3D model he adapted from Zack's Harappa Project that suggests a 3rd centre of recent population expansion in Eurasia, possibly aborted and partially "eaten up" by the other two. Maybe there was a lack of local suitable native plants and the resulting less competitive crops lead to a semi-forager lifestyle. This would perhaps explain the absence of archaeological evidence.

Based on these indications I thought it would be fun to run a supervised ADMIXTURE analysis on East+South Asia.
I used a Fertile Crescent pole, consisting of Adygei+Turks+Palestinians. A Yellow River pole based on Beijing Chinese and Japanese. And a hypothetical 3rd Revolution pole based on Papuans and Melanesians, for lack of better "unadmixed" populations (even though I'd expect Papuans to be more of a fringe group and not exactly the Core Area population).
Here are the results.

The results correlate to a surprisingly high degree with the estimates by Reich and al as the table at Dienekes shows.

Estimating "Basque" admixture in Balts

As Dienekes pointed out, in the absence of an "unadmixed" parent population an admixed population can appear in an ADMIXTURE analysis as if it was less mixed or "unadmixed" itself. And it can be very complicated to determine a reasonable estimate of suspected admixture proportions...

Assuming a (near-) Complete Replacement model, Balts are an interesting population. Since neighbouring Scandinavia and Northern Poland present archaeological evidence of Megalithic Culture wave colonization, it seems likely that they would present non-zero Western Wave admixture. Additionally Baltic Finns seem to derive their "European" (originally Near Easterner) element much more from Scandinavian-like populations than from Baltic-like populations, suggesting that the Comb Ceramic culture was founded mainly by Neolithic immigrants from the Western Wave and not the Eastern one. The "Siberian"-like element found in low percentages in current Finns would, in this model, be derived from ancient Northeastern European Hunter-Gatherers, and not from any ancient undocumented Siberian expansion.

However it is difficult to ascertain this, since the main parent population of Balts is likely not represented in any currently available collections.
Adyguei and other Caucasian populations were likely modified in ancient times by secondary more advanced waves coming from the Neolithic Core Area further south- the Fertile Crescent (including Egypt). Secondary Neolithic waves apparently are substantially different genetically, with more southern affinities (Egypt? I will explore this later) than the primary waves and an analysis with them renders Balts "Basque-like".

As for intervening populations Russia currently doesn't allow for data collection from it's citizens, and any evidence from steppe populations, unlike agricultural settled ones, presumably suffered much more erosion, since nomadic groups are highly mobile, herds can be stollen and it's much more difficult for conquerors to establish rent-drawing relationships in such environments.

So how can first Western Neolithic Wave ("Basque"-like) admixture in Balts be estimated?
Only by using surrogate populations obeying a set of strict conditions
1 Likely strongly affected by the first Eastern Wave
2 Likely not affected by the Western waves or subsequent Eastern ones
3 Living in environments better suited for agriculture than to pastoralism

Turns out that there is such a population already genotyped: the Chuvash. They're not ideal because they have some significant non-neolithic component ("Siberian") but by running an "unadmixed" Siberian population as a control, reasonable insights may be gained.

I ran an admixture analysis of Europeans with the following populations as poles:
1. Chuvash: I'm using them as an attractor for all Eastern Neolithic influence. Discussible, but just an experiment
2. Basques: representative of possibly little admixed Megalithic culture people? May have been subject to drift effects but Basques may be the closest thing available to a pre-Indo-European population. This pole may also catch some some of the Balkan (Danubian Culture and related) first wave component, since sharing a common origin in Western Anatolia it was likely similar.
3. Yakuts: Siberian. Some superposition with Chuvash, but serves as a control for excess Siberian admixture in analysed populations, allowing better clues for actual Eastern wave affinity between the Chuvash and Northern Europeans.
5. Turks: this pole should draw all secondary wave elements plus a corresponding subsumed part of the primary wave one. It serves as a control for subsequent Southern influence

Here are the results. Bear in mind that even if roughly correct, they are at best indications of actual ancient population admixture percentages. Basques likely have some non-Megalithic admixture, "Turkish" likely consists of a Megalithic substrate superimposed by secondary wave elements; Chuvash partially obfuscates "Yakut". Still these results are the product of a simple set-up, I believe are well based in reasonable assumptions and can be justified by well-grounded arguments.

Fst divergence between estimated populations

"Yakut" "Basque" "Chuvash"
"Basque" 0.124
"Chuvash" 0.087 0.036
"Turkish" 0.101 0.028 0.032

I added my interpretation to the labels.

These results cannot be due to purely geographical clines. Components don't correlate with distance very well.

More speculatively, it seems strange to find some "Yakut" where it shouldn't be. It is most likely noise. However one should expect to find a continuum of hunter-gatherer genotypes across Northern Eurasia (as opposed to just in Northern Europe- the Urals are a poor barrier to gene flow). This region likely was colonized late and most regions from the first one as such populations necessarily has to acquire expensive cultural and genetic adaptations to the cold environment. Also this element is found most significantly in Russians (0.6% likely much subsumed into nearby "Chuvash"), as expected, White Americans (0.4%: Amerindian, less similar so likely underestimated). As for Spaniards and Romenians it is interesting that these are the most mountainous countries whose populations were analysed. In Spain in particular, late surviving hunter gatherer lifestyles are suggested by archaeology.

In the Northern European plains populations, "Basque"+"Chuvash" admixture brings to mind the Kurganized originally Megalithic Corded Ware Culture. I think it's suggestive that the late arriving Eastern Wave achieved such success within an already Western Wave colonized region. This may have something to do with not only with new advanced pastoralist lifestyles and lactose tolerance, but maybe also with the development of Rye, which appears archaeologically in Central Europe around then. East Slavs may then have expanded from Central Europe to the East, absorbing the (Finnic) Comb Ceramic Culture, only after acquiring this cereal.

ADMIXTURE doesn't know better

I've been running ADMIXTURE recently, seeking evidence backing a mostly Neolithic origin of current populations.
Unsupervised admixture I believe tends to allocate ancient clinal mixed populations to their own component in the absence of "unadmixed" populations, peaking in the less admixed ones. It produces artificial results, that though informative about population relationships, are not as informative about actual components corresponding to actual ancient populations that underwent admixture.
For instance, these runs in Europe tend to produce an apparently homogeneous "North European" component modal in Lithuania, as well as a similar "South European" component modal in Sardinia. I've always suspected this didn't make any sense in relation to what we know of actual history, since if North European populations are of local Paleolithic continuity, they are very old, and would not coalesce into a single component spanning Iberia to Russia. Also hardly would any Neolithic component be concentrated in the North, since it would have to make it's way from further south. There are no inventions of agriculture recorded in Northern Europe, and the region has been well investigated by field researchers.
Sardinia is another problem. If this component was Paleolithic, why would it be present in so many countries next to other less related components? If it's Neolithic why would it not have absorbed some amount of other waves present in nearby populations?
These results cannot actually correspond to any population that ever existed, according to any historical, archaeological knowledge we have. Something was wrong. Thing is ADMIXTURE is a computer program and it doesn't know any history or archaeology. It can't test hypothesis by itself. Left alone it simply produces nice graphs of relatedness of populations, plus indications of admixture between more distant groups.

For more sensible results we have to direct the program according to historical data, and then see if hypothesis are validated. Also we can then apply our conclusions to new populations that may be available in the future.

I'm Diogenes and have been posting comments recently in several population genetics and related subjects blogs I read, such as Razib's Gene Expression ; Dienekes Anthropology Blog, Davidski's Eurogenes Project and Zack's Harappa Project.

I named my blog Artemis since I believe the "Neolithic" which shaped our world for the last 10,000 years is now ending. Demeter's shackles are broken.

I'm starting my own Project playing with ADMIXTURE and other programs. I'm not a scientist (even though I work in a field related to biology), but I'll try to substantiate my thoughts whenever possible.

Just as politics and news blogs are not about bringing you 100% credible news from a specialist; this blog is just a podium for ideas and their discussion. I have many more or less substantiated ideas about several subjects. I intend only to contribute in my personal way. The indispensable and proper scientific process is best left for specialists.
But just as blogs can be less conventional and more imaginative if less credible source of news and opinions, so can a blog dedicated to other areas. I don't have academic credibility. I'm not presenting a CV or publication history. But I don't have academic constraints either. I don't need to hide controversial but interesting ideas just because they may be seen as silly by some and may harm my ability to do research in the future.

The internet is connecting intelligent people from different places and allowing discussion of ideas. That's how you make, always made, knowledge. Except now perhaps it's beginning to happen on a different scale.