Tuesday 3 May 2011

Irula and Basques, Sardinians, Lithuanians?

I have stated before that I think unsupervised results of European populations are based on modal populations that are admixed themselves.

Just as the "Irula" component tends to pick up most of South Asian diversity in an artificial manner contradicting
some recent research (Reich et al -see also Dienekes); and different supervised methods seem to confirm it, so I believe the same can be done to European populations.

The Irula are modal for the South Asian component for a simple reason. All Indian populations are a mix of Fertile Crescent incomers (Ancestral North Indian-ANI) and older established populations (Ancestral South Indian-ASI). Yet even though we have extant relatively unadmixed Fertile Crescent populations (ANI), we do not have any extant ASI populations. So Irula, in whom ASI peaks among public genotyped populations appear as if almost 100% "South Asian", even though they are really roughly 1/2 ANI- 1/2 ASI. Most Indian populations tend to be dominated by this element as well, the remaining part being more conventional Fertile Crescent components. So the "South Asian" component is really ASI+ANI and doesn't correspond to either of them in any "unadmixed" form.

Zack has done interesting work using the Onge of the Andamans as a proxy for them, and I have run a simple analysis using Papuans. Neither really correspond to the actual ASI.

In Europe and nearby regions thus we similarly get Lithuanian-modal, Sardinian-modal, Basque-modal, and West-Asian-modal components. Mozabite modal and Southwest Asian-modal populations I've previously analysed in a way that suggests they derive from peaks in Mozabites and Bedouin of "NW African" and "Nile Core" respectively.

So I'll be trying to do the same to European populations in the next few days, and reveal unsupervised results for what they probably mostly are: clines, in which ADMIXTURE picks the peak population as the modal one. Naturally revealing intra-"Mesopotamian Core" clines is much harder and inexact than clines between less closely related populations.

Here are the first runs. I've used my trick of using "restricted" poles in order to get more accurate estimates, but these are experimental runs, and the thing to note is what remains similar in either. I plan to develop better ways to fish out these components soon, so don't take these first results too literally. I've used European populations, excluding most Middle Easterners since I think "inner-Core" Mesopotamian subsets different from the "frontier" ones may be hidding there and don't want to make the run more complex than it already is. For the same reason I excluded Cypriots. I did include a handful of participant samples from the ME and Southeast Europe: being few, they won't "demand" their own component and confuse results, yet can give me some clues as to how to proceed when expanding the set.

Firstly "Basque5" versus "Chuvash5" versus 5 Egyptians+ 5 Mozabites. Bear in mind having an element doesn't mean ancestry from one of these modern groups, only that portions of DNA tend to cluster together with these available poles.
Admixture proportions for Chuvash and Basque obviously exclude the 5 pole individuals in each.
FST genetic distance estimates by ADMIXTURE
"Basque5" to "Chuvash5" 0.035
"2nd Wave" to "Basque5": 0.048
"2nd Wave to "Chuvash5": 0.040
Siberian to "Basque5": 0.133
Siberian to "Chuvash5": 0.104
Siberian to 2nd Wave: 0.120
Amerindian to "Basque5" 0.231
Amerindian to "Chuvash5" 0.194
Amerindian to 2nd Wave: 0.216
Amerindian to Siberian: 0.164

As fst shows, "2nd wave" is much more of a "Mesopotamian Core" element than a Northeast African element here. It contains a measure of the Nile Core-admixture found in Europeans though. Since the Egyptian expansion itself I think had more Mesopotamian than Nile Core, this is not surprising. The smaller presumably forager components are merely indicative since this run is too rough and they might be partially subsumed or be subsumed by bits of other elements.

Now with "Basque5" substituted for "Sardinian5", remaining poles being the same. Sardinian population averages presented exclude the 5 Sardinian samples used as pole.
FST
"Sardinian5" to "Chuvash5" 0.035
"2nd Wave" to "Sardinian5": 0.049
"2nd Wave to "Chuvash5": 0.045
Siberian to "Sardinian5": 0.128
Siberian to "Chuvash5": 0.111
Siberian to 2nd Wave: 0.122
Amerindian to "Sardinian5" 0.225
Amerindian to "Chuvash5" 0.203
Amerindian to 2nd Wave: 0.218
Amerindian to Siberian: 0.163
Notice that the largest distances are not similar so the smaller ones can't be exactly compared between different runs. I think "2nd wave" is slightly more concentrated in this last run, but still mostly Mesopotamian.

Balanced "Chuvash5" + "Basque5" in more "inner-core" influenced populations such as Assyrians and South Italians simply means that their "Mesopotamian" is simply the more diverse parent of both "outer-core" "Basque5" and "Chuvash5", who are perhaps West-Anatolian and NorthEast Anatolian particular subsets of it. I now see some evidence for a secondary "inner-core" expansion before the second wave. More on that later.
I think these results are a bit inexact, but general components seem to hold in runs with other quite different European poles (tomorrow I may present some of these). They also appear in a very confused and mixed way in unsupervised results. So they very likely represent something real. Yet these are obviously preliminary results. I'm not sure if Lithuanians are so much Eastern Wave-derived as these particular results seem to imply, although I now think my early estimate of "Basque Admixture" using the full-set supervised Basque and Chuvash poles overstimated the "Basque" or Western Wave element there (Basques seem to have quite a bit of the Eastern element themselves!). Still "restricted poles" seem to estimate actual components better.

So firstly it is interesting that Sardinians do not have the Eastern Wave element in either run. This is as expected, if "Chuvash5" came from the Northeast (perhaps more remotely from the Steppe, Caucasus and originally Northeastern Anatolia in that order).
Sardinians do have, however a large Second Wave element mostly absent in Basques. And Basques, even in the "Basque5" run, seem to have quite a bit of this "Chuvash5" component.

Northern Europeans seem to have more "Chuvash5" than the Basque, particularly towards the East; and some small Second Wave element in some regions.

This seems an adequate explanation for unsupervised ADMIXTURE runs' results. Sardinians are modal for their "Sardinian" component since they lack the "Chuvash5" and Basques because they lack "Second Wave" yet have much more West Wave than most. Lithuanians are modal also for their component since they have both little Second Wave and "Basque5". And West Asians like the Assyrian Christians appear to have some "Chuvash5" (more on this latter) but much more Second Wave. All other European peoples are intermediate between these 4 "extreme" extant populations. Thus they can be adequately reconstructed by unsupervised ADMIXTURE using components modal to the Basques, Sardinians, Lithuanians and West Asians. Unsupervised ADMIXTURE has no way to know if actual Sardinians, Basques, Lithuanians, West Asians actually settled Europe. It assumes it was so since it was meant, I think, to determine admixture proportions for populations whose parent populations still exist (like Mexicans and African-Americans).

Faced with such a scenario in which unadmixed parent populations are mostly not there anymore, it picks the most extreme populations and uses these admixed populations as if they were the parent ones, generating historically illogical results.

Looking back to the results. Why do Basques have this Eastern element at the 10-20% range? Why did it spread to Spain if it arrived AFTER the "Basque" or western element (archaeology strongly supports a model of agricultural spread from the South towards the Northeast)? And why do Middle Easterners and Southeast Europeans have both even though they're likely source? A few points:

From it's smallish presence in Basques, I think it's clear this is an later intrusive element to an already agricultural population. I don't think these farmers were the carriers of Indo-European languages. Maybe they spoke distantly related languages. One argument concerns Y-haplogroups. R1a is prevalent, R1b mostly absent, in populations with very high "Chuvash5" in the run. R1b is dominant in Basques, but also common in Sardinians, but R1a absent. Languages spread with elites, but these people weren't passing much of their Y-Chromossomes to their children, at least West of Central Europe, and yet West Europeans appear to have plenty of their genes. Some discrepancies between mit-DNA Danubian Neolithic remains' and modern Europeans suggests to me that they perhaps did contribute plenty of mit-DNA together with autossomes. Language imposing elites behave in exactly the opposite way, with high Y, low mit transmission.

"Chuvash5" people likely had high levels of R1a, "Basque5" people high levels of R1b. R1b is well characterized as originating from Western Anatolia and is present in high levels in Southern Europe. Another argument as to why it must have arrived first, is that Sardinians are 20-30% R1b and have no "Chuvash5" (yet plenty of "Basque5"). So "Chuvash5" likely entered the "Basque5" dominated Basque country, Spain, France, Ireland, the British Isles and the Central European river valleys, and had major impact in autossomes and probably also in mitochondrial haplogroups yet very little Y-chromossome haplogroup impact.

How can this pattern be explained? I have been thinking about a speculative model and now it's taking shape.
Northwestern Europeans (France and British isles in this context) have generally >60% R1b, but in these runs ~50% "Chuvash5". If "Chuvash5" were intruders specializing in cold environment, poor soil, agriculture (with Rye and other innovations/developments), they would find the most fertile soils in the region already inhabited by the "Basque5" people and their wheat agriculture. So in a simple scenario, villages would form in a checkered-board pattern. "Basque5" villages would be already established near rivers and the best soils at very high densities, and thus impossible to remove or substitute. However there may have been plenty of other areas to which primitive wheat agriculture was not congenial, and these would still be inhabited by foragers at much lower densities. Most of these areas would however allow for productive rye agriculture, at much higher densities than foraging. "Chuvash5" migrants could very well have used such niches. They would not be able to settle in the "Basque5" areas, but they would have major advantages versus foragers in their mountain, inter-fluvial and sandy soil regions. So "Chuvash5" villages would maybe become established not far from "Basque5" villages downhill. "Chuvash5" is really about a Mesopotamian subset-derived, cold-adapted Neolithic wave I believe.

In such a scenario, good years would lead to large surpluses in "Basque5" villages, but smaller ones in "Chuvash5 ones. Densities in the former's areas would be quite higher than in the last's too. Lowland elites would form much more easily in "Basque5" villages, but "Chuvash5" ones would remain more egalitarian.

In good years, high levels of surplus would allow men in "Basque5" villages to find other occupations, war, trade for less basic resources for instance. With time social structure would make such arrangements permanent at the expense of peasants. Militarized and commercial elites from rich "Basque5" villages would come to dominate less wealthy "Chuvash5" ones. The result over thousands of years and multiple elite forays from the wheat agriculturally rich areas into rye less productive ones would be exactly what you see: elite Y-DNA spreading while autossomes and mit-DNA remain balanced.

In even colder areas such as Scandinavia and Northeastern Europe, colonization by R1a agriculturalists had much greater advantages. Fewer wheat-adequate environments exist there, vast Rye-congenial ones predominate, and Y-DNA would be more balanced or even the reverse.

So "Chuvash5" is I think really about a new subset of Fertile Crescenters, one adapted to marginal lands and cold-environment agriculture during their passage and evolution in the steppe. The spread of some more admixed people from the Corded Ware region and their explosion into the Eastern expanses of Eurasia suggests to me the final "critical point" may have been reached in this region and time.

Later on, perhaps related pastoralists came again out of the Steppe, not as farmers, but as horse-mounted warriors and conquerors. These maybe were the source of some small Y-DNA contribution. But without major food producing improvements, I very much doubt they had a significative autossomal contribution. They may have had a major linguistic impact, as such horse-mounted militarized elites often have. Or maybe such tongues are a different story entirely.

Tomorrow I'll run a better version of the runs posted above, and will post individual participants results. I decided not to post spreadsheets for this one since people would be reading too much into individual results. Tomorrow's run will be very similar but more solid and valid I think.

1 comment: