Tuesday, 24 May 2011

Back to Africa- individuals

These are some of the individuals from the run.

(There is likely a lot of small component noise here, and also between NMPC, WMPC, EMPC due to their high similarity and small number of European individuals included)
Although African-Americans appear to present some variability in their African elements, their African ancestry can probably be best described as the product of a New World melting pot of originally Niger-Congo-speaking, West African ethnic groups. Affinity seems to be higher with ethnic groups from the Gulf of Guinea region, and the more Northern Bantu-speaking groups.
Their non-SubSaharan African ancestry is mostly composed of elements found in Europe as expected. There is very little East African ("Nilo-Saharan").
As for the Fulani it seems they have been somewhat isolated genetically from neighbouring populations, even though they live nearby non-Fulani peoples across a vast swath of the Sahel/West Africa region. Only a few individuals appear to be recently admixed to any degree. High levels of the Northwest African element, their general lack of any NCongo components except for NCongo1, together with NCongo1 being particularly high in presumably more isolated Sahel populations such as the Dogon all point to an ancient ethnogenesis for this group, perhaps in the (last) Green Sahara itself at the time of the West Asian Neolithic colonization of North Africa.

Bear in mind this run attempted to distinguish very closely related components and a high amount of "noise" is likely at the individual level.
Also Fertile Crescent components (FC) are being based in a small sample since this was about Africa. So NMPC here is different from my previous run, since it is based on the Lithuanians who have much more WMPC (Basques) than the Chuvash. EMPC was based on the Urkarah. So they don't have the same exact distributions as before.

Niger-Congo components are also very close and shouldn't be taken as exact at the individual level. "Nilo-Saharan" is not very distant from the NCongo components either, some people postulate a common origin for the language families, and thus perhaps some genetic similarity is expected. Indeed if both Neolithic expansions are related and come from the "southern shore" of the Sahara, most modern African populations may have derived from demographic and linguistical expansion, from the whereabouts of the desert only a few thousand of years ago, probably in association with a Neolithic Revolution associated with the Green Sahara.

Monday, 23 May 2011

Back to Africa- populations

I've been busy with unrelated stuff lately so didn't find time to post last week.
I've decided to take a new look into variation within African populations, using the restricted pole trick I used before in West Eurasian ones. I wanted to take a look at variation within the Niger-Congo or West African Neolithic Core variation in particular, by trying to split it up into 3 components.
I should point out I'm doing these runs seeking support for a theoretical model. It's hard if not impossible to provide definitive evidence (if such a thing exists at all) through ADMIXTURE results. But models which provide unexpected results or predictions which can then be tested by other methods or analysis of future data not available right now, have value in my opinion.

I've run a few unsupervised runs of the African samples available to me, compared with those in other projects, and used a few observations to guide me in pole choice. I was particularly interested in possible language-family-related components, since even though language groups don't always correlate exactly with genetics, there tends to be some relationship.
Restricted poles comprising only a few individuals are a good way of establishing one or more "centres" for a cline in the data. They don't "stick" however if there is no such cline, or if the cline extremes are not represented. For instance, I chose a "Palestinian5" pole using 5 Palestinians, yet it was immediately stolen by the Saudis, who present a similar and otherwise unrepresented component to Palestinians yet at a higher level.
However, for such clines as the Niger-Congo or "West African" cline, from Mandenka/Dogon to the South and East African Bantus, it is useful to establish some such "hand-picked centres" to differentiate subpopulations. The frontier between the 3 Niger-Congo components I've come up with is thus necessarily somewhat artificial. Choosing different "restricted poles" would result in different components to a much larger degree than in previous "Basque5" or "Lithuanian5" poles I've used, since these are much stabler.

These were the poles chosen:
1) Three Mandenka and two Dogon for a Northwestern Niger-Congo component. Dogon often appear as modal for the West-African component in unsupervised runs, indicating possible origin of the West African Neolithic in the Sahel. I named this component NCongo1 (Niger-Congo1)
2) Five Yoruba: NCongo2
3) One Pedi, two Fang, two Kongo and one Luhya to try to identify the Bantu component, which I named NCongo3 (Bantu)
4) Three Bulala and two Masai. Masai often claim their own component and at higher Ks even multiple ones, due to close family links between some of them. I removed some of these individuals, but in order to avoid such artefacts, I used Bulala trying to make the Masai pole more independent from the Masai themselves. This component turned out to be important in all Nilo-Saharan speaking populations (plus the Sandawe, and even these can be split away into their own pole) so I took the liberty of naming it so, which shouldn't be taken too seriously.
5) Five Biaka Pygmies (BPygmy)
6) Five Mbuti Pygmies (MPygmy)
7) Five Hadza
8) Five Namibian San- Khoisan pole
9) Five Palestinians. This pole was taken by the Saudi sample, as mentioned before. I named it FC(NEAfr). It seems to be important in Semitic language speaking populations, but it goes further into other populations as well. It roughly corresponds to the "second wave" or WMPC+NC component used in Fertile Crescent runs.
10) Two Mozabite, two Tunisians and 1 N Moroccan. This pole peaks in the Tunisian sample and was named FC(NWAfr)
11) Five Basques. Named FC(WMPC) - Fertile Crescent Western "Mesopotamian Core"
12) Five Lithuanians, named FC(NMPC)
13) Five individuals from Urkarah, Northern Caucasus. Named FC(EMPC)

I'm sorry for some repeated colours, Google graphs made the choices.
Populations are roughly distributed by language group in the graph.
1) Hausa is Afro-Asiatic from the Chadic group
2) Mandinka and Bambaran belong to the Mande group of Niger-Congo languages. Dogon languages may be related early offshoots of proto-Niger-Congo.
3) Brong, Yoruba and Igbo speak NC languages of the Atlantic-Congo group, the group which includes Bantu languages. Their languages are not Bantu though, and each belongs to a different subgroup distinct from the Bantoid one and from each other
4) Bamoun or Bamum speak an Atlantic-Congo language of the Bantoid subgroup but their language though related is not considered part of the more narrow Bantu-proper group.
5) Fang, Kongo, Luhya, Xhosa, Pedi, Nguni, Sotho, Tswana all speak Bantu-proper languages. These are a subgroup of Bantoid, itself a group within Atlantic-Congo, which is a major Niger Congo family.
6) Bulala or Bilala, Masai, Alur, Hema and Kaba all speak languages of the Nilo-Saharan family
7) Mada seem related genetically but speak an Afro-Asiatic language of the Chadic group

-NCongo1, NCongo2 and NCongo3 seem to exhibit clines as expected since these are more or less spurious components dividing a continuous cline of West African-like peoples from the Sahel to South Africa. Judging by fst to neighbouring forager components, they seem to have absorbed a bit of these elements into the components (eg NCongo3 seems to have absorbed a bit of San).
-There seems to be a remarkable concordance between component patterns and language families, including subgroups of such families. The model predicts such languages would expand in association with Neolithic peoples' colonization movements. Some discrepancies may be explained as later elite imposition of intrusive languages.
-Dogon languages seem to be the outgroup in the Niger-Congo languages, and they are the most distant to the Bantu genetically within the Niger-Congo group. They may represent an early offshoot, perhaps isolated since, of the original Neolithic Core/Revolution. In unsupervised runs, Dogon are often the "most West-African" of all these population samples.
-Nilo-Saharan-speaking populations share a common component, indicating spread of most Nilo-Saharan tongues not only by elites but more likely by another food-producing revolution. Bulala are from Chad, Masai live in Kenya. Both are pastoralist peoples.
- Alur seem to have some small but significant Mbuti Pygmy ancestry. These peoples live not far from each other and both speak Nilo-Saharan languages, probably adopted by the Pygmies in the last few thousand years in contact (and marginalization) with Alur-related peoples from the North.
-Similarly there are Biaka Pygmy segments in Bantus, and Biaka Pygmies today speak Bantu languages adopted from their agriculturalist Bantu neighbours.
- Both Chadic-speaking groups (Hausa and Mada) seem to have some "Nilo-Saharan" component. Hausa's Niger-Congo-speaking neighbours mostly lack it. Hausa may be a West-African population subjected to elite language-shift towards Afro-Asiatic after intrusion from AA-speaking herders from the East.
-It is possible the reverse may explain Luhya ethnogenesis, with an intrusion of Bantu agriculturalists into Nilo-Saharan pastoralist occupied land.
-Fertile Crescent ancestry of Masai and Ethiopians is mostly FC(NEAfr), or most like that of Saudis, which seems to be in agreements with linguistics. However these populations also seem to have some Northwest African influence (FC(NWAFr), just as Egypt but unlike in Saudis (remember the pole were actually 5 Palestinians). I think it's possible that at the time of the Semitic speaking migrations from Arabia, possibly some 2000-3000 years ago, there was already an older Egypt-derived Fertile Crescent element in the region, whose languages do not survive.
-Fertile Crescent elements found earlier in the Fulani seem to be wholly derived from North West African populations. Fulani speak a Niger-Congo language and have much affinity to other West African populations as well, so I'd say this admixture event is more likely very ancient.
-I don't know why FC(NMPC) elements appear in North African and Levantine populations here but not in my previous run. But I included very few European and Near Eastern populations in the analysis and these FC poles are very very close in comparison with the African ones, so it's possibly just due to noise and lack of definition due to few individuals with actual FC(NMPC) and WMPC. Also my last NMPC was based in the Chuvash (Lithuanians have much more affinity to Basques), but I didn't want to include Siberian poles in order to keep things simpler. This run is complicated enough as it is and small components may not represent anything much.
-North African populations may have an aboriginal substrate more complicated than I thought earlier, with possibly aboriginal NorthWest African, Green-Saharan refugee African and possibly other elements in addition to the West-Asian (FC) dominant element, so small segments may be representing such hidden elements and not actual admixture here I think. I'm still convinced elements with SubSaharan African affinity mostly represent aboriginal populations and are not the result of the caravan slave trade.

Tomorrow I'll present individual results.

Saturday, 14 May 2011

Restricted Pole Run: Part IV- experiments, conjectures

Due to Blogger problems I had to delay this last post.

This is a very experimental part of the run, not altering any other results. Also my interpretations here are quite speculative and I don't have very high confidence in them.

I found out a while back, that when including many individuals of similar background between themselves, but very different from the remaining samples, in one run, ADMIXTURE will pull one of the restricted poles towards the group, irrespective of any relation between actual pole individuals and this different population.
Thus if I had included all the South Asian data-set in my last run, one of the poles would simply become dominated by them, and would peak in the Irula. I didn't want to do that yet, since South Asia is a complex place genetically, and would only make results less clear. Still, I wanted some clues as to which Fertile Crescent elements made their way there.
By including just a few individuals from the area in the run, the pole-pulling problem can be avoided, and ADMIXTURE will instead try to fit them into the non-South Asian-dominated poles.
This means that some results are necessarily artificial for these samples. For instance no adequate pole for Ancestral South Indian (ASI) is present. Since ASI are somewhat "Asian" when compared to Fertile Crescent populations, I expected ASI elements to be mainly allocated to the Siberian poles.
So bear in mind in this run, "Siberian" in South Asians is mostly not actual Siberian or Turkic admixture. It is simply the least inadequate pole for the ASI element. It doesn't matter anyway for this experiment since what I really wanted to check was which Fertile Crescent elements were present -that is, which patterns are present in ANI.

So this part is highly experimental, but the additional individuals analysed here don't alter the remaining results appreciably (if removed, other individuals in the run still retain their admixture patterns).
In addition you may have noticed I didn't include an Amerindian pole in this run. I didn't for two reasons, firstly "Amerindian"-components in Europeans tend to be absorbed by the NMPC since they exist in mostly NMPC populations (including Chuvash used as restricted pole). Siberian tends to detach because many NMPC-rich populations don't have much Siberian, but the same can't be said for the "Amerindian" I found earlier, so they tend to get mixed up (except if using a FC pole without any of it such as the Egyptians).
The other reason was I wanted to check which poles would ADMIXTURE allocate to Amerindians themselves, if denied an exclusive pole for them. Amerindians are quite distinctive in PCA/MDS and in unsupervised runs. They cluster far away from Western populations, further away even than Siberians.
If "Amerindian"-like populations were present in Paleolithic Europe, we would expect them to be more "western" than their very "eastern" plotting position would imply -and namely more "westerly" than East Siberians.
But what if Amerindians are plotting in the "far-east" because for some reason they had a few highly distinctive genetic variants, but were otherwise not so distinctive. When denied their own poles, these distinctive variants wouldn't be allowed to pull them away. ADMIXTURE would be forced to allocate the remaining more conventional variability to conventional poles.
Two things might happen:
1) Amerindians would be allocated 100% to some Far-Eastern Siberian pole- which would support their plotting position being derived from their assumed Far-Eastern departing position into the New World.
2) Amerindians would be split into more conventional poles and their more "western" position, if abstracting from the few exotic elements, would be revealed. This would support western routes into the Americas, or perhaps a fast sprint after the end of the Ice Age, through recently ice-cleared far Northern Eurasia (mostly bypassing then more southeastern Siberian populations).

I thus introduced 5 unadmixed (no significant Spanish or European elements) Totonac individuals. As expected just 5 individuals weren't enough to pull the remaining poles towards them too much- they didn't get any Amerindian pole.

I actually expected Amerindians to come out as some Far-Eastern Siberian+Nganasan pole pattern. But this is what I actually got:

Siberian1 peaks in the Nganasan. Siberian2 (blue) in Yakuts and Mongolians. Siberian3 peaks in Far-East Siberians (Chukchis and Koryaks). Siberian3 is actually based on "Mongolian5", but "ran away" from them.

EBengal1 is Razib Khan from Gene Expression.
UKIND is British (explaining high WMPC) with some Indian.
I picked also 3 random Kalash, who I'm not sure are distinctive mostly because of inbreeding or long term isolation.
Naturally for these populations part of the admixture components is artificial. There is no high "Siberian" in Indians, but it is the "least inadequate" pole to represent ASI in this run.
As for the Totonac neither of the 3 poles is actually adequate since Amerindians are a highly distinctive population. I should point out that the NMPC in Totonac does not correspond to European elements, the Totonac sample is quite homogeneous, with very little such admixture.

EMPC is the predominant Fertile Crescent element in India. There is no other likely reason for ADMIXTURE not to pick the most adequate FC element from all such poles it had to choose. There is some NMPC as well. The lack of WMPC+NC in these populations, which is present in the steppe pastoralists (even in the Kyrgyzstani) points IMO to distinct migrations from similar origins. The colonization of the Steppe with the development of advanced pastoralist lifestyles seems to have occurred after the Second, Out-of-Egypt, wave. The colonization of India, departing from the same region (Iranian plateau, Caucausus, South Mesopotamia?) seems to have happened before the Egyptian wave, but possibly after the EMPC one. The earliest Northwest Indian Neolithic settlements are dated approximately about 6000BC which is in accordance with this possibility.
The representation of ASI-like segments variously by Siberian3, Siberian1 and Siberian1+ WMPC+NWAf may be related to ASI diversity among these populations. If South India was mostly settled, and even then with a high aboriginal persistence, only after the secondary EMPC wave developed (as opposed to a possible Northwestern settlement by a "primary" NMPC wave) this could point to a native incipient Neolithic, at least in South India.

One conjectural model:
1) An earlier less advanced expansion by a high NMPC containing population influencing only "easy" Fertile Crescent toolkit niches in the Northwest
2) Later an advanced secondary Neolithic expansion containing high EMPC from a developed Near Eastern Neolithic Centre, with much improved seeds and techniques finally making some way into Southern and Eastern India, while mostly replacing the earlier wave in the Northwest?
3) Maybe followed by a small reexpansion of the Northern element from the periphery (now mostly EMPC but still with more NMPC than Southerners?)
I'm not sure why ADMIXTURE didn't find WMPC+NC small elements apparently typical of Central Asian populations here. But a possibility is that Central Asian demographic influence in India is overestimated in other models.

About the Totonac results. Indians had an "Eastern element" (ASI) that had to be assigned to an Eastern pole (Siberian poles). South Indians seem to have a more "Southerly" ASI variant which was perhaps artificially allocated to the MPC+NWAf pole. This can be seen in PCA plots.
Amerindians on the other hand are "far eastern" even relative to Siberians. There could be a number of explanations for this, but I think it's interesting ADMIXTURE chose to represent them with Siberian3+NMPC+Siberian2. These don't correspond to actual admixture events (much like "Chinese Mexicans"). They could be partly due to much "Amerindian"-like admixture in North Europeans being allocated to the NMPC pole (since I didn't include an Amerindian pole and high NMPC populations all have residual "Amerindian"-like elements)-making it slightly more "Amerindian"-like than some other poles available.

This is pure speculation but it's as if ADMIXTURE, when forced to ignore some possible Amerindian exotic elements (due to having to pick exclusively from among pole populations without as much of them), is telling us that Amerindians are otherwise "more Western" than they seem to be in the PCA plots.
It was already strange that "Chinese Mexicans" had a smaller "Chinese" component than a Totonac one. Greater proximity between Chinese and Europeans in PCA plots would imply that a Chinese pole would tend to overestimate the Amerindian element not the reverse. East African overestimated the African component in African-Americans as predicted.

So summarily: Totonac obviously don't have any real NMPC. Possibly neither much of the Siberian admixture they seem to have. ADMIXTURE component patterns simply "plot" the Totonac's position relative to the poles available while excluding elements not present in any of the respective aggregated components.
They have some affinity to NMPC only because they're denied their proper poles in this run. This is I think because NMPC has some slight affinity (having "absorbed" them in this run) to possible "Amerindian"-like variants in North Europeans. +Totonac being more "Western" than they seem as long as a few conjectural exotic small elements are forcefully ignored by the run set-up.

Here is the participant's spreadsheet. Full run spreadsheet.
You may also be interested in checking out an interesting 3D PCA model at Harappa.

Inland Ocean. Restricted Pole Run: Part III

This is the third part of the restricted pole run of Western Eurasia. All results are from the same set-up and analysis. This part concerns Central Asian and Siberian populations. Central Asia is a sparsely populated Steppe expanse connecting all major Eurasian population centres: West Eurasia, East Eurasia, and South Eurasia, and these with remaining Siberian populations to the North (today mostly replaced by Russians).
Like finding ancient bones is tropical areas, finding ancient population genetic fossils in Steppe populations is probably more difficult. Unlike settled agriculturalist populations, who I think present much more continuity since the Neolithic Revolutions, there have likely been major changes in the sparsely populated, pastoralist inhabited, sand and grass oceans of the Eurasian interior.
For instance historically it can be presumed that such populations were more "West Eurasian" thousands of years ago than they are now, since a North or East Asian component has become important or even predominant. Still, looking at the Fertile Crescent-related components across the populations, we can perhaps get hints about early Western Neolithic influence. Component proportions preserved across all groups likely were derived from founding populations. Those exhibiting clines, maybe more likely introduced more recently.

For more extended interpretations of the components please read my previous two posts.
In retrospect, I think using the "Chuvash5" as the NMPC pole may have underestimated it and overestimated a bit the "Basque5" one. Next time I may use a combination of Chuvash and Lithuanians and see what happens.

Regarding the above results, some considerations:
1) Uzbekistan Jews seem to have, like most Jewish populations, a predominant Levantine element. They give better contrast to patterns consistent between the other Central Asian populations.
2) All Central Asian populations have large Yakut and Mongolian-like components corresponding to a likely Turkic and Mongolic element.
3) All Central Asian populations also have a "Fertile-Crescent" element composed of EMPC+NMPC+2nd Wave in that order of importance. This element is quite similar to that of the Caucasus populations. These last don't have much Turkic/Mongolic. A good model explaining these patterns is that the Central Asia steppe (as opposed to the river valleys of the Ukraine and South Russia) was initially populated by a Fertile Crescent element coming from the Caucasus/Iranian plateau at a relatively later date.
4) The Turkic/Mongolic component varies widely in a cline between them. Removing it, allows for a much better view of Fertile Crescent element patterns:
CAsia is a participant, mostly Kazakh in origin I believe. Hazara may have more EMPC due to admixture with Iranian plateau populations. PonticCaspian, a participant represented in the main graph above, with Moldavian Gagauz and some Ossetian ancestry presents much the same pattern except with higher WMPC (possibly due to proximity to Balkan/West Caucasus populations). Much of the same pattern can be observed in Altaians and Buryats, in a much smaller FC element.
I really don't see population changes after the initial invention/introduction of advanced pastoralism producing such an homogeneous pattern (bear in mind some of these are small components, indeed all of these in the Mongolian case are small, since they're mostly an Eastern population and can be overinflated by the exclusion of major "Siberian" ancestry). Certainly not from Mongolia to the far west. These patterns I think most likely roughly correspond to the original Kurgan pastoralist people, and possibly also to the ancient Tocharian peoples.
5) Northeast Europeans have much NMPC but little EMPC. This "Chuvash5" element could have come from the same region at an earlier time, before or in the beginning of the EMPC expansion/ intrusion. Some small EMPC elements in Northeast European populations may indicate that the expansion Northwards into more marginal lands of NMPC was driven by competition/conflicts with the EMPC intruders.
6) Another explanation for EMPC in Northeast Europeans is a secondary expansion of Steppe peoples into the region, after the NMPC primary expansion.
- WMPC+NMPC in Koryaks, Chukchis and other Siberians should correspond to recent Russian admixture. WMPC may be overestimated in this run, but proportions appear similar in Russians in my previous post.
7) Caucasus mountain valley populations appear to have preserved various demographic "pictures" of past admixture patterns (much like Basques and Sardinians), pointing to demographic changes in the Caucasus-North Fertile Crescent region: firstly mostly NMPC agriculturalists expanding into the rivers of the Ukraine and South Russia, as seen in Lithuanians (and Chuvash); then affected by the EMPC expansion as seen in the Urkarah; later affected by the "2nd wave" (WMPC+NC) as seen in Lezgins and Stalskoe. It seems it was at this last stage that pastoralist populations emerged into the steppes, otherwise it is difficult to explain the remarkable consistency in EMPC vs NMPC proportions in all pastoralist populations sampled.

So the current model I think most likely:
-NMPC was primarily an early (pre-EMPC expansion or simultaneous to it) agriculturalist expansion into the river valleys of South Russia and the Ukraine, likely not affecting the remaining marginal steppe to a large extent.
-Populations from the Caucasus and Iranian plateau were then heavily affected first by the EMPC and then to a smaller extent by the subsequent 2nd wave (WMPC+NC) secondary expansions. These had "higher drag" and didn't affect more northern NMPC populations much (possibly also due to much colder climate rendering secondary wave innovations less applicable).
-At some point not long after the WMPC+Nile Core expansion (so around 3000BC or so), people from the Caucasus and/or Iranian plateau expand into the Steppe with developed pastoralist lifestyles, probably identifiable archaeologically with the Kurgan culture.
-All these steppe populations are subsequently affected, in a more clinal thus more recent way, by Siberian/East Asian elements, corresponding to Turkic/Mongolic expansions.

I thought this would be the last post from this run, but I've decided to leave some South Asian and other individuals I included in the run (just a few samples, results without them aren't appreciably different) for later, since otherwise it's too much for just one post. I'll present the spreadsheet and individual participants then, possibly today if I can find the time.

Wednesday, 11 May 2011

Old New World. Restricted pole run: Part II

The European results of my "restricted pole" supervised run of the Fertile Crescent area follow. This is the exact same run as this one.
I didn't include most individual participants results here (only regional averages and a few isolated representatives of populations not otherwise sampled) since they're too many to post with every run. I'll post the spreadsheet with Part III (Central Asia, Siberians and a few others), including all participant's results.

I have a few warnings to readers not familiar with the variability of ADMIXTURE results:
1) ADMIXTURE is a bit stretched figuring out patterns representing components as close as these, particularly over 30.000 or so SNPs. So small components aren't very reliable.
2) This run includes lots of extra populations and extra poles to my last "Basque5" vs "Chuvash5" one, so components, though related to those and named similarly, aren't the same and will differ.
3) In particular, the WMPC, WMPC+NC and WMPC+NWAf seem to vary at each other's expense a bit. WMPC+NC ended up being too much drawn to the Druze, and WMPC+NWAf too much to Tunisians. These samples may have multiple distant family links within them. I decided to keep these populations for the time being, but may remove them in a future run. So for some small elements in some populations, one of these components may be standing up for another one.
4) Any ancestry from regions not represented in the poles will tend to be pulled towards the "least inappropriate" pole. For example if some individuals have some East Asian ancestry it may appear as a Siberian segment.
5) My interpretations are obviously just conjecture, sometimes better argued than others.

You can read more extended interpretations of components in my previous post.
-WMPC: "West Mesopotamian Core", referred to before as "Western Wave". First wheat planting Neolithic colonization of the Mediterranean and Western Europe. I think they came from the Levant and Anatolia. I think it may correspond to Megalithic and Danubian archaeological horizons. In colder climes, this wave probably only occupied wheat-congenial regions, leaving less adequate ones to foragers. They're best represented today in Basques, Sardinians, and also at a lower lever in Western Europeans in general. Possibly with high R1b (and I?) Y-haplogroups
-NMPC: referred to before as the "eastern wave". Cold, poor-soil-adapted first Neolithic wave, maybe due to innovations such as winter-rye. Expanded North into the Steppe rivers from a homeland possibly in the Eastern Iranian Plateau, Caucasus range or Northern Mesopotamia. Later, after adaptation, it would have spread throughout cold and sandy soils in all of Europe, especially in the North, bypassing rich agricultural areas already inhabited at high densities by WMPC people. Possibly with high R1a Y-haplogroup levels. mit-haplogroups introduced by this wave into WMPC areas might explain why there's a "mit-DNA gap" between Danubian remains and modern Central Europeans. I would tend to identify the beginnings of NMPC expansion into Central and Western Europe with the Corded Ware culture. This may have been also the "melting pot" from where East Slavs began their long expansion towards the Pacific.
-WMPC+NC: synthesis of WMPC early intruders into Egypt and local Nile Core elements. Referred before as "Second Wave".
-WMPC+NWAfr: synthesis of Green-Sahara derived native North West Africans and WMPC. I think the expansion of this element into Iberia and beyond may have happen very early on (much earlier than the second wave in the East), at the time of initial WMPC colonization of the region via the Northern Mediterranean route.
-EMPC: East Mesopotamian Core. Patterns of NMPC and WMPC suggest to me this is a local, more eastern element that underwent expansion into the NMPC and WMPC homelands in ancient times, before the second wave from Egypt. I think a model of Neolithic developments generating higher surpluses and elites/specialists generally should coincide with a demographic expansion from the same region. That is any ancient people developing agricultural productivity high enough to enable them to live at much higher densities, and thus partially swamp out neighbouring related already agricultural peoples, must have produced enough food surpluses to allow relatively much larger elites/specialists and better social organization. Such secondary wave origin points should thus be identifiable archaeologically. The "Second Wave" I have identified with Egyptian Civilization (which begins at around 6000-5000 years ago, at the same time according to some studies as the early proto-Semitic expansion). Based also on ADMIXTURE patterns, I would tentatively relate the EMPC expansion with a Southern Mesopotamian homeland.
-All these components, except for the Siberian ones, derive I think from the ancient Near East. The Siberian ones correspond to perhaps ancient traces of European hunter-gatherers. I didn't include an Amerindian pole this time, since with multiple MPC components they tend to be identified and subsumed into the NMPC I think (since NMPC and forager residual segments tend to exist in the same populations-possibly due to NMPC late occupation of much of the colder, less fertile soil niche). More on that later.

Monday, 9 May 2011

Western, Eastern, Northern, Southern: Motherland?

In this post I'll tackle the last unsupervised component: the West Asian-modal component, which peaks in Georgians and exists in high amounts in all West Asians. This component appears much throughout Europe, and strangely, in unsupervised runs it is a bit closer by Fst distances to the Lithuanian-modal component than it is to the Sardinian or Basque-modal ones. What is going on here?

To recapitulate my previous results/interpretations:
1. Basque and Sardinian-modal. In MDS and PCA plots, Sardinians and Basques plot not far from each other, but in distinct clusters out of the European "mainstream". Sardinians further towards modern Near Eastern populations, Basques more towards modern North European populations. In supervised admixture runs, both populations seem to be dominated by the same element, with Sardinians having a smaller element with North African/Near Eastern affinities, and Basques mostly without this element but with another element predominant in Northeastern Europe. I interpreted these results as indicating "2nd wave" influence on Sardinians, and "Eastern Wave" influence on Basques, exclusively superimposed on a common "Western Wave" element. This element, also found in large percentages in Southern, Northwestern and Central Europeans, but lacking in Northeastern ones, would correspond to the first Wheat-planting Neolithic expansion into Europe. Less fertile/colder areas not congenial to Wheat would have been moslty left to remaining forager populations providing meat and fish in exchange for wheat. Indeed such an arrangement is documented in several places in the World, and is still found in some Southeast Asian regions, where demographically dominant agriculturalists in more fertile/rice adapted soils exchange agricultural products with protein from foragers in marginal lands.
2. Lithuanian-modal, also present mixed with Siberian elements in the Chuvash, seems to be prevalent in cold/poorer soil areas of Europe. I think it may correspond to an Eastern Neolithic Wave from the Northern Near East through the Caucasus and Steppe into Eastern Europe. After adoption of cold-adapted agricultural techs, such as winter-rye, it expanded into the vast niches left mostly unoccupied by the earlier Western Mediterranean-Atlantic and Danubian waves. It replaced the forager populations still present in those marginal lands. An analogy is to the settlement of Japan by the Yayoi people. Only after developing cold-adapted rice varieties and techniques were they able to perhaps migrate and completely replace the Ainu-like foragers in rich coastal foraging environments less suited to earlier Neolithic lifestyles.
3. Mozabite/Tunisian and Bedouin/Saudi/Egyptian components are mostly about peaks of NW African and NE African ("NileCore") components. The Green Sahara may have had semi or full pastoralist developments explaining the presence of such components, admixed with "Mesopotamian Core" ones in such a high degree in some nomadic populations.
4. "Siberian" and "Amerindian"-like small elements in Finns, Russians, Scandinavians, Balts and Irish, British are I think not derived from any undocumented migration of ancient Siberian peoples but genetic traces of Native European populations. I've tentatively interpreted "Amerindian"-like segments in North Europeans and people from the Caucasus Mountains (which also appear in unsupervised runs) as survivals of an old "Amerindian"-like aboriginal population, making it's way to the Americas through the Ice Age ice cap with a lifestyle similar to that of Inuit/Eskimos today.

So the last component still eluding explanation is the "West Asian" one. Why is this closer to North European than to any other unsupervised mode component?
I've stated before, I think this component is a Mesopotamian Inner Core element, very similar to the European Western and Eastern Wave ones. Indeed these last may be "frontier" elements with less diversity, subsets of the Inner Core one. This would explain why Near Easterner and Southeast European populations tend to have large balanced "Basque5" and "Chuvash5" elements in previous runs. So the Inner Core element may be the "Mother" of these less diverse frontier elements that expanded into Europe. Later, the Inner Core populations, perhaps in Mesopotamia, would achieve more evolved Neolithic capabilities and expand into the Western and Eastern Wave settled regions.

So "West Asian" peaks in Northern Near East populations, such as Armenians, Iranians, Georgians, however using these populations as the pole is problematic since these same populations have signatures of "Nile Core" influence. It however also appear in high levels in Northern Caucasus populations such as Lezgins, Adyghei, and in the samples from the Mountain Dagestan towns of Urkarah and Stalskoe.
Using my "restricted pole" supervised run trick, I used first Georgians as the pole. As expected this erased much "2nd Wave" influence from European populations, since the "Georgian5" pole attracted many such segments together with the North African one. Using "Urkarah5" I got higher percentages in Northern Europe than seemed reasonable and since in previous unsupervised runs, Lezgins, Urkarah and Stalskoe seem to have some "Northern European" in addition to predominant "West Asian" I thought it would be best to combine some Georgian, Iranian and Urkarah individuals to allow the pole better to "focus" on the element I was searching for.
Naturally this is discussible, but results using any of the poles are not very different.
I also decided to ditch Mozabites and Egyptians and replace them by Tunisians and Palestinians as NW African and NE African (Nile Core) poles respectively. Mozabites and Egyptians are more southern populations and any influence from North Africa in European populations is already represented in Tunisians and Palestinians. I excluded all other North African populations in order not to complicate things further, except for predominantly agricultural Northern Moroccans, which I wanted to use to pull the "Tunisian5" pole into the region (in order to find influences in South Western Europe).

I'm dividing this run into three parts: Near East; Europe; and Siberia+Central Asia. They are all part of the same run. First I'll present results for the Near East. I included some populations in Near East and European posts to link them, since it's all from the same analysis.

Poles used:
1. Siberian1: 5 random Yakut individuals "Yakut5" restricted pole. This captured most of the Siberian Turkic and Mongolic elements. It peaks in Buryats, with Yakuts and Mongols not far behind.
2. WMPC: "Basque5". This pole captured an element which peaks in Basques and Sardinians, but is also present in Southern and Northwestern Europe in important amounts. Somewhat surprisingly, it is also important in the Levant (unlike the other element present in Northwest Europe, "Chuvash5"). This is the Neolithic first Western Wave, perhaps associated with a Mediterranean-Atlantic migration route and its Megalithic monuments, depending on river-valley wheat agriculture. It also presumably represents here the probably very closely related Danubian wave.
3. NMPC: "Chuvash5". I renamed it Northern MPC since it seems to be Northern relative to the Fertile Crescent area. It peaks in Europe in the North East, and I've called it "Eastern Wave" before. Maybe a cold and poor soil-adapted Neolithic Wave, and rye agriculture (also perhaps occupying much of the "wheat niche" in some parts of Eastern Europe).
4. WMPC+NC. West MPC with some Nile Core, also referred before as "2nd Wave", or out-of-Egypt. Perhaps associated with proto-Semitic (and Exodus tales). A late expansion from perhaps 6000-5000 years ago. It was based on "Palestinian5" however it came to be dominated by the Druze, perhaps due to multiple family connections in this somewhat isolated population. It doesn't matter though, since the Druze are still adequate representatives even as a modal population. The Palestinians, Jordanians and others had their higher Nile Core component taken over by the Tunisian pole. I could remove the Druze, but things wouldn't change much and I prefer to keep all populations at this point. French Basques also seemed "inbred" before, but I'm now convinced their "isolated population pole-pulling" tendency is mostly due to their unique Neolithic wave mix.
The second wave element is closer to the "Basque5" element than to any other MPC element. I think the native element of the Levant may be closely related to the "Basque5" element, and this would be the subset expanding into less advanced incipiently Neolithic, Nile Core dominated Egypt, synthesize with it, and reexpand into its homeland and beyond as the "2nd Wave". This would be a more advanced Wheat-planting expansion, and the probably much larger food surpluses making it possible also allowed higher organization levels, specialization, and elite forming in Egypt itself- a process leading to ancient Egyptian Civilization.
I seem to recall a R1b Y-haplogroup found in ancient Egyptian remains-perhaps it was native after all since ancient Egyptians might have been a synthesis of WMPC and NC?
5. EMPC: based on two Urkarah individuals+1 Iranian+2 Georgians. If based on "Urkarah5" or "Gorgian5" it has a greater tendency to draw too many non-pole individuals of said population and any 2nd Wave or NMPC elements also present there. By combining the restricted pole, this problem can be reduced. I think EMPC is an inner-core MPC element perhaps present in Mesopotamia itself. It seems to have expanded against the WMPC and NMPC within the Near East after these lasts' expansion outside of it, but likely well before the second wave. Also seems to be the subset corresponding to Ancestral North Indians.
6. WMPC+NWAf: based on "Tunisia5". This component likely consists of mostly WMPCA+some Green Saharan derived NWAfrican I found before. Tunisians were drawn at almost 100% into the pole, unlike Northern Moroccans, probably due to multiple distant family relations within the Tunisian sample. Still matters little, since it is representative of Western North Africans. Since I believe Tunisians are still representative of ancient Berber-speaking Neolithic populations I removed Mozabites. Using Tunisians may allow more sensitivity, since Mozabites are more distant.
7. Siberian1. "Nganasan5" using only unadmixed Nganasan. Representing a Western Siberian element.
8. Siberian3. "Mongolian5". Strangely this pole didn't pick up much in other Mongolians, but instead focused on Chukchis and Koryaks from the Far East. Mongolians turned out mostly "Siberian2". This happened presumably due to closeness of "Yakut5" and "Mongolian5" (just as before I used "Dogon5" to capture NWAfrican with the help of another West African pole). This happens frequently in "restricted pole" supervised runs, for instance in Northern Europe runs even a "Orcadian5" versus "Hungarian5" pole analysis reproduces imperfectly the "Basque5" vs "Chuvash5" scenario, since "Orcadian5" and "Hungarian5" shift away from their individuals' populations and peak in Basques/Sardinians and Lithuanians/Chuvash instead. So restricted pole runs in my opinion tend to shift towards real patterns in the data.

This is still an imperfect run, don't read too much from smaller components. Still results are similar using various different set-ups (and not too different from unsupervised ones).

Some speculative considerations:
-WMPC is present in the Levant and North Africa, whereas NMPC is not. It is also dominant over it in Turks, Armenians, different Jewish groups. This suggests that WMPC ("Basque5") has its ultimate origin here, in the Levant and Anatolia. It seems to have expanded not only into the Mediterranean and Balkans, but also into North Africa. I think the 2nd Wave likely is just WMPC admixed with some North Eastern African ("Nile Core").
-NMPC is dominant over WMPC in Iranians, Kurds, North Caucasus and Eastern Caucasus. I tend to think it derives from an ancient population living in the Northern Mesopotamian/Caucasus range/Western Iranian plateau region.
-EMPC may correspond from an expansion into both the WMPC homeland (Levant/Anatolia) and NMPC homeland (Northern Mesopotamia?/Eastern Anatolia?/Western Iran?) from an inner area, perhaps Southern Mesopotamia.

I'll present European results later.

Thursday, 5 May 2011

Individual Results for "Basque5"

I've decided to post the "Basque5" individual data after all. I think it's a good departing point and it helps to understand why "Chuvash5" (I'm interpreting it as the cold adapted "Rye"-farmer wave component) or "Basque5" (the warmer river valley dweller, earlier "Wheat"-farmer wave) percentages in one individual may vary quite a bit between runs. ADMIXTURE can give quite different results under different set-ups. This is not a reason for ignoring them, but in my opinion another source of very valuable information. If you keep in mind the framework, some result variability is sometimes enlightening.
"Mesopotamian Core" elements in particular tend to grow and decrease a bit at each other's expense in different runs. This happens due to their great similarity, sharing a common source (possibly in ancient Anatolia).
I want to reiterate that this is a first experimental run, and smaller percentages in an individual have high likelihood of being just noise, and even small percentages in populations are merely indicative.
Also forager elements may be eaten up by other components, or may be representing more exotic admixture in some cases as well (South Asian, East Asian may appear as "Siberian in a few cases).
In addition, I included participants with only European/ME ancestry, but 2nd wave may be eating up any very small non Nile Core African elements in some New World, and perhaps other, individuals, for instance.
Particularly in the "Sardinian5" run, the "Chuvash5" element was I think overestimated in several populations, due to a little deviation of this component towards the what I've been calling the "2nd Wave" element, which Sardinians have in significant amounts.
So this is a very imperfect run. It is self-admittedly very chunky and will be improved in the future. Please don't assume for now any 1-2% of anything is actually something.

AJ2 is Dan Vorhaus from Genomes Unzipped.
Some considerations:
SWFrance is from Gascony and shows large "Basque5" as expected.
Germany1 has much Rhineland ancestry, and also has much "Basque5".
Southern Germany, Switzerland, Slovenia and Hungary appear to have more significative "2nd Wave" than more northern populations, which I assume would be present in neighbouring populations as well. Raetic, a relative of Etruscan, was spoken in the region in pre-Roman times.
PonticCaspian has mostly southern steppe and whereabouts origins. He has very high levels of "Chuvash5" relative to "Basque5".
Some Americans have Southern European ancestry.
Balanced "Chuvash5"+"Basque5" elements in Southeasterner populations closer to the Fertile Crescent may correspond to an "Inner-Core" component including diversity present in both "outer-core" components and thus not the product of West Wave/East Wave admixture.

I'll post more, better results in the near future.

Tuesday, 3 May 2011

Irula and Basques, Sardinians, Lithuanians?

I have stated before that I think unsupervised results of European populations are based on modal populations that are admixed themselves.

Just as the "Irula" component tends to pick up most of South Asian diversity in an artificial manner contradicting
some recent research (Reich et al -see also Dienekes); and different supervised methods seem to confirm it, so I believe the same can be done to European populations.

The Irula are modal for the South Asian component for a simple reason. All Indian populations are a mix of Fertile Crescent incomers (Ancestral North Indian-ANI) and older established populations (Ancestral South Indian-ASI). Yet even though we have extant relatively unadmixed Fertile Crescent populations (ANI), we do not have any extant ASI populations. So Irula, in whom ASI peaks among public genotyped populations appear as if almost 100% "South Asian", even though they are really roughly 1/2 ANI- 1/2 ASI. Most Indian populations tend to be dominated by this element as well, the remaining part being more conventional Fertile Crescent components. So the "South Asian" component is really ASI+ANI and doesn't correspond to either of them in any "unadmixed" form.

Zack has done interesting work using the Onge of the Andamans as a proxy for them, and I have run a simple analysis using Papuans. Neither really correspond to the actual ASI.

In Europe and nearby regions thus we similarly get Lithuanian-modal, Sardinian-modal, Basque-modal, and West-Asian-modal components. Mozabite modal and Southwest Asian-modal populations I've previously analysed in a way that suggests they derive from peaks in Mozabites and Bedouin of "NW African" and "Nile Core" respectively.

So I'll be trying to do the same to European populations in the next few days, and reveal unsupervised results for what they probably mostly are: clines, in which ADMIXTURE picks the peak population as the modal one. Naturally revealing intra-"Mesopotamian Core" clines is much harder and inexact than clines between less closely related populations.

Here are the first runs. I've used my trick of using "restricted" poles in order to get more accurate estimates, but these are experimental runs, and the thing to note is what remains similar in either. I plan to develop better ways to fish out these components soon, so don't take these first results too literally. I've used European populations, excluding most Middle Easterners since I think "inner-Core" Mesopotamian subsets different from the "frontier" ones may be hidding there and don't want to make the run more complex than it already is. For the same reason I excluded Cypriots. I did include a handful of participant samples from the ME and Southeast Europe: being few, they won't "demand" their own component and confuse results, yet can give me some clues as to how to proceed when expanding the set.

Firstly "Basque5" versus "Chuvash5" versus 5 Egyptians+ 5 Mozabites. Bear in mind having an element doesn't mean ancestry from one of these modern groups, only that portions of DNA tend to cluster together with these available poles.
Admixture proportions for Chuvash and Basque obviously exclude the 5 pole individuals in each.
FST genetic distance estimates by ADMIXTURE
"Basque5" to "Chuvash5" 0.035
"2nd Wave" to "Basque5": 0.048
"2nd Wave to "Chuvash5": 0.040
Siberian to "Basque5": 0.133
Siberian to "Chuvash5": 0.104
Siberian to 2nd Wave: 0.120
Amerindian to "Basque5" 0.231
Amerindian to "Chuvash5" 0.194
Amerindian to 2nd Wave: 0.216
Amerindian to Siberian: 0.164

As fst shows, "2nd wave" is much more of a "Mesopotamian Core" element than a Northeast African element here. It contains a measure of the Nile Core-admixture found in Europeans though. Since the Egyptian expansion itself I think had more Mesopotamian than Nile Core, this is not surprising. The smaller presumably forager components are merely indicative since this run is too rough and they might be partially subsumed or be subsumed by bits of other elements.

Now with "Basque5" substituted for "Sardinian5", remaining poles being the same. Sardinian population averages presented exclude the 5 Sardinian samples used as pole.
"Sardinian5" to "Chuvash5" 0.035
"2nd Wave" to "Sardinian5": 0.049
"2nd Wave to "Chuvash5": 0.045
Siberian to "Sardinian5": 0.128
Siberian to "Chuvash5": 0.111
Siberian to 2nd Wave: 0.122
Amerindian to "Sardinian5" 0.225
Amerindian to "Chuvash5" 0.203
Amerindian to 2nd Wave: 0.218
Amerindian to Siberian: 0.163
Notice that the largest distances are not similar so the smaller ones can't be exactly compared between different runs. I think "2nd wave" is slightly more concentrated in this last run, but still mostly Mesopotamian.

Balanced "Chuvash5" + "Basque5" in more "inner-core" influenced populations such as Assyrians and South Italians simply means that their "Mesopotamian" is simply the more diverse parent of both "outer-core" "Basque5" and "Chuvash5", who are perhaps West-Anatolian and NorthEast Anatolian particular subsets of it. I now see some evidence for a secondary "inner-core" expansion before the second wave. More on that later.
I think these results are a bit inexact, but general components seem to hold in runs with other quite different European poles (tomorrow I may present some of these). They also appear in a very confused and mixed way in unsupervised results. So they very likely represent something real. Yet these are obviously preliminary results. I'm not sure if Lithuanians are so much Eastern Wave-derived as these particular results seem to imply, although I now think my early estimate of "Basque Admixture" using the full-set supervised Basque and Chuvash poles overstimated the "Basque" or Western Wave element there (Basques seem to have quite a bit of the Eastern element themselves!). Still "restricted poles" seem to estimate actual components better.

So firstly it is interesting that Sardinians do not have the Eastern Wave element in either run. This is as expected, if "Chuvash5" came from the Northeast (perhaps more remotely from the Steppe, Caucasus and originally Northeastern Anatolia in that order).
Sardinians do have, however a large Second Wave element mostly absent in Basques. And Basques, even in the "Basque5" run, seem to have quite a bit of this "Chuvash5" component.

Northern Europeans seem to have more "Chuvash5" than the Basque, particularly towards the East; and some small Second Wave element in some regions.

This seems an adequate explanation for unsupervised ADMIXTURE runs' results. Sardinians are modal for their "Sardinian" component since they lack the "Chuvash5" and Basques because they lack "Second Wave" yet have much more West Wave than most. Lithuanians are modal also for their component since they have both little Second Wave and "Basque5". And West Asians like the Assyrian Christians appear to have some "Chuvash5" (more on this latter) but much more Second Wave. All other European peoples are intermediate between these 4 "extreme" extant populations. Thus they can be adequately reconstructed by unsupervised ADMIXTURE using components modal to the Basques, Sardinians, Lithuanians and West Asians. Unsupervised ADMIXTURE has no way to know if actual Sardinians, Basques, Lithuanians, West Asians actually settled Europe. It assumes it was so since it was meant, I think, to determine admixture proportions for populations whose parent populations still exist (like Mexicans and African-Americans).

Faced with such a scenario in which unadmixed parent populations are mostly not there anymore, it picks the most extreme populations and uses these admixed populations as if they were the parent ones, generating historically illogical results.

Looking back to the results. Why do Basques have this Eastern element at the 10-20% range? Why did it spread to Spain if it arrived AFTER the "Basque" or western element (archaeology strongly supports a model of agricultural spread from the South towards the Northeast)? And why do Middle Easterners and Southeast Europeans have both even though they're likely source? A few points:

From it's smallish presence in Basques, I think it's clear this is an later intrusive element to an already agricultural population. I don't think these farmers were the carriers of Indo-European languages. Maybe they spoke distantly related languages. One argument concerns Y-haplogroups. R1a is prevalent, R1b mostly absent, in populations with very high "Chuvash5" in the run. R1b is dominant in Basques, but also common in Sardinians, but R1a absent. Languages spread with elites, but these people weren't passing much of their Y-Chromossomes to their children, at least West of Central Europe, and yet West Europeans appear to have plenty of their genes. Some discrepancies between mit-DNA Danubian Neolithic remains' and modern Europeans suggests to me that they perhaps did contribute plenty of mit-DNA together with autossomes. Language imposing elites behave in exactly the opposite way, with high Y, low mit transmission.

"Chuvash5" people likely had high levels of R1a, "Basque5" people high levels of R1b. R1b is well characterized as originating from Western Anatolia and is present in high levels in Southern Europe. Another argument as to why it must have arrived first, is that Sardinians are 20-30% R1b and have no "Chuvash5" (yet plenty of "Basque5"). So "Chuvash5" likely entered the "Basque5" dominated Basque country, Spain, France, Ireland, the British Isles and the Central European river valleys, and had major impact in autossomes and probably also in mitochondrial haplogroups yet very little Y-chromossome haplogroup impact.

How can this pattern be explained? I have been thinking about a speculative model and now it's taking shape.
Northwestern Europeans (France and British isles in this context) have generally >60% R1b, but in these runs ~50% "Chuvash5". If "Chuvash5" were intruders specializing in cold environment, poor soil, agriculture (with Rye and other innovations/developments), they would find the most fertile soils in the region already inhabited by the "Basque5" people and their wheat agriculture. So in a simple scenario, villages would form in a checkered-board pattern. "Basque5" villages would be already established near rivers and the best soils at very high densities, and thus impossible to remove or substitute. However there may have been plenty of other areas to which primitive wheat agriculture was not congenial, and these would still be inhabited by foragers at much lower densities. Most of these areas would however allow for productive rye agriculture, at much higher densities than foraging. "Chuvash5" migrants could very well have used such niches. They would not be able to settle in the "Basque5" areas, but they would have major advantages versus foragers in their mountain, inter-fluvial and sandy soil regions. So "Chuvash5" villages would maybe become established not far from "Basque5" villages downhill. "Chuvash5" is really about a Mesopotamian subset-derived, cold-adapted Neolithic wave I believe.

In such a scenario, good years would lead to large surpluses in "Basque5" villages, but smaller ones in "Chuvash5 ones. Densities in the former's areas would be quite higher than in the last's too. Lowland elites would form much more easily in "Basque5" villages, but "Chuvash5" ones would remain more egalitarian.

In good years, high levels of surplus would allow men in "Basque5" villages to find other occupations, war, trade for less basic resources for instance. With time social structure would make such arrangements permanent at the expense of peasants. Militarized and commercial elites from rich "Basque5" villages would come to dominate less wealthy "Chuvash5" ones. The result over thousands of years and multiple elite forays from the wheat agriculturally rich areas into rye less productive ones would be exactly what you see: elite Y-DNA spreading while autossomes and mit-DNA remain balanced.

In even colder areas such as Scandinavia and Northeastern Europe, colonization by R1a agriculturalists had much greater advantages. Fewer wheat-adequate environments exist there, vast Rye-congenial ones predominate, and Y-DNA would be more balanced or even the reverse.

So "Chuvash5" is I think really about a new subset of Fertile Crescenters, one adapted to marginal lands and cold-environment agriculture during their passage and evolution in the steppe. The spread of some more admixed people from the Corded Ware region and their explosion into the Eastern expanses of Eurasia suggests to me the final "critical point" may have been reached in this region and time.

Later on, perhaps related pastoralists came again out of the Steppe, not as farmers, but as horse-mounted warriors and conquerors. These maybe were the source of some small Y-DNA contribution. But without major food producing improvements, I very much doubt they had a significative autossomal contribution. They may have had a major linguistic impact, as such horse-mounted militarized elites often have. Or maybe such tongues are a different story entirely.

Tomorrow I'll run a better version of the runs posted above, and will post individual participants results. I decided not to post spreadsheets for this one since people would be reading too much into individual results. Tomorrow's run will be very similar but more solid and valid I think.

Monday, 2 May 2011

Hammering the iron

Trying to extend my last analysis Northwards I was faced with a few difficulties.
Finding 2nd ("African"-like) wave elements; or Aboriginal ("East Asian"-like -or perhaps Amerindian-like) in Europeans is easy, since contributor populations were quite different.
The problem lies, I think, with different subsets and possibly pre-2nd wave expansions, within what I've been calling the Mesopotamian Core. This is much harder to detect and differentiate reliably, since I don't have many if any still "unadmixed" populations from that time.
So I'll kind of have to find the unknown "metals" through comparisons of how differently mixed "alloys" relate to each other.

In supervised ADMIXTURE terms this means I generally get series of tightly related components coming from the Mesopotamian region but probably also from Eastern Anatolia and the Levant. ADMIXTURE seems to be picking a series of distinct but very close clusters in the MPC. When considering only North Africans, Basque poles stretched to become the local variety of MPC.
With more populations, there are sufficient differences that ADMIXTURE tends to fall into lumping them with aboriginal/2nd wave smaller elements into artificial components. For instance the Siberian/Amerindian-like element of some North Europeans often disappears into the local MPC; the Nile Core of Southerners and especially Middle Easterners into their local varieties. Thus in many runs "Basque" becomes dominant in Europeans, erasing most of Siberian, "Sandawe" expands to the point of having Mesopotamian Core-like fsts incompatible with North East African origins, and ends up including most Middle Easterner diversity. South Europeans end a mix of mostly "Basque" with some "Sandawe5". The "Sandawe5" in a few cases swaps places with the "Dogon5" pole everything else similar, proving IMO that these components are real and not directly related (although perhaps somewhat distantly akin) to actual Dogon or Sandawe. And the whole thing is often very "chunky".
Unsupervised ADMIXTURE on the other hand always picks out supposedly but in my opinion not generally "unadmixed" group of individuals (see Irula in the "Indian cline"). It kind of makes it's own poles up. If original populations are not present in the data, results are not reliable (in terms of actual progenitor populations, they're reliable for relationships if properly interpreted).

Neolithic Revolutions are quite uncompromising after a "critical point" in their maturation. Before they reach this point however, they consist of distinct if related populations or subsets competing with each other, and perhaps mixing and differentiating constantly via short breath regional expansions.

The MPC seems to be composed of such a group of subsets. Here are some I think may be hidding in the data:
1) A "Basque centred" Med-Atlantic Wave subset important from Italy to the British Isles and beyond, but also present in Central Europe. Probably from Western Anatolia via a maritime coastal village-to-village route. Very similar to another Western Anatolian wave via the river valleys of central Europe.
2) A "Chuvash centred" East wave subset predominant in the Northeast but present in much of Europe. Probably from the Caucasus and Northeast Anatolia via the Steppe river valleys. It may have later spread with new agricultural developments for colder environments or poorer soils (Rye).
3) A Levant-East Anatolian element, now centred in Armenians, Druze, Georgians. I'm not sure if this isn't the parent component to the Basque and Chuvash ones, thus being distinct due to greater diversity. Meaning it could be a first level MPC population and the Basque and Chuvash ones being peripheral subsets of this subset. It could be also, or simultaneously, an "inner-core" MPC element that superseded the others in a posterior expansion. Y-haplogroup markers seem to support this possibility. This subset seems to be the one which made it all the way into India. It was probably present in Mesopotamia as well.

As I said all these components are very closely related. That's why in unsupervised ADMIXTURE you get apparently surprising results, such as the Northeast Europe-modal component being closest to the Anatolian/Levantine one (makes sense if you realise one is a subset of the other).

I increasingly think it's likely none of the above subsets emerged in an unadmixed form, in any population, from the settling of the Neolithic Revolution. Some seem to have admixed with elements from other non-MPC cores. Others mixed with other subsets of the same MPC core (for instance in the Corded Ware area). Still others with possible forager elements. Middle Eastern populations all have this presumably Northeast African element I've named the Nile Core. From preliminary runs, I feel that some European populations are less admixed than others, but all seem to have significant admixture from other elements. Which creates a problem concerning appropriate pole populations.

I will dedicate the next few days to search for different "Mesopotamian Core"-only subsets and expansions. For lack of suitable public sets, I will include participants data, but please don't read too much into the first experimental provisional results. When I have a model that makes more sense, it will be apparent I think.