Artemis: 2011

Tuesday, 14 June 2011

Panasian-Part IV: Malaysia and Indonesia

Part IV (and last) of the unsupervised Panasian set ADMIXTURE analysis.

Fst distances provided by ADMIXTURE:

About the populations in this part:
The "Malay" population is comprised by Malaysian Malays; "SGMalay" individuals are from Singapore; while MalayIndonesia are equally Malay dialect-speaking, but from South Sumatra, Indonesia (the official languages of Malaysia and Indonesia, Bahasa Malaysia and Bahasa Indonesia, are different registers of a dialect continuum called "Malay" historically used as a lingua franca spanning both countries).
Temuan speak a language from the "Aboriginal Malay" or "Proto-Malay" group. They are a small group nowadays, and still practice Animism and slash-and-burn agriculture.
Sundanese and Javanese are the main Austronesian-speaking inhabitants of Java and represent together almost half of the Indonesian population (20M and 80M respectively), dominating its politics. Sundanese and Javanese languages are distinct from official Malay-derived Bahasa Indonesia which most of them nowadays speak as well (some exclusively).

Mentawai are Animist/Christian slash-and-burn (swidden) farmers planting sago, yams, taro and raising pigs and chickens much as the original Austronesians and as other modern-day isolated Austronesian peoples; supplemented by much hunting and fishing as well. They live in the Mentawai islands just off West Sumatra.
Batak Karo and Batak Toba are also Austronesian speakers from inland Sumatra; they seem to be dry rice-agriculturalists, also cultivating paddies in some regions.

The "Dayak" sample is derived from an (Austronesian-speaking) Indonesian Dayak tribe from East Borneo, and are slash-and-burn agriculturalists. Biduyah are also Dayaks, but differ from the "Dayak" sample in that they are from Sarawak, Malaysia (Northwestern Borneo). They have presumably received more outside influence than the more Eastern Dayaks, and appear to be more involved in the current plantation economy. (The Iban, represented in a previous run with the main set, are also Dayaks from Western Borneo).

Toraja are Austronesian speakers from Sulawesi's mountainous interior. They're farmers who remained Animist until recently (they are mostly Christian nowadays).

Kambera, Manggarai, Lamaholot, Lembata and Alorese live in the Lesser Sundas in Southeastern Indonesia. They speak Austronesian languages, and are mostly Christian today. These groups present visible Melanesian admixture.

Naasioi are Melanesians, speaking a Papuan language. They share Bougainville island in Papua New Guinea with Austronesian-speaking groups, and are slash-and-burn farmers.
Kensiu and Jehai are Malaysian Negrito peoples.

Some candid observations (and shameless speculations):
1) Austronesian tongues seem to correlate with the darkgreen component modal in Taiwanese Aborigines. Earlier there seemed to be a weaker correlation with Tai-Kadai languages.
These linguistic families have important similarities, and it is controversial among linguists weather they are branches of a more ancient proto-language, or if both branches shared geographically close homelands- their present distribution suggests homeland(s) in Southern China, and populations analysed from China, including all Han groups, also have darkgreen elements.
These are however largely lacking in represented Austro-Asiatic speaking populations (though I would be surprised if more Southern Chinese-influenced Vietnamese weren't the exception); Burmese Sino-Tibetan speakers; Uyghur; Okinawans; and are small in Koreans and Japanese mainlanders (perhaps due to an old Han admixture event/secondary wave?)- all these groups have important "red" elements however.
This could be interpreted as suggesting an origin, or at least diffusion of Sino-Tibetan and Altaic with Neolithic Expansions directly from the Northern China/Yellow River region before the ethnogenesis of the Han (before admixture from populations from the Yangtze River valley?).
(At this moment I would tend to interpret the Altaic element, even in Central Asians, as being derived early on from the East Eurasian Neolithic just as I previously interpreted their Western admixture as being derived from the Fertile Crescent).

2) Austronesian-speaking groups closer to continental SE Asian influence tend to have smaller darkgreen and larger darkblue elements- such as Bidayuh (SW Borneo) and the populations from Java.
The darkblue component is important in continental Southeast Asian populations and those from the Indonesian Archipelago with more historical SE Asian connections. It somewhat correlates with Austro-Asiatic languages, perhaps it indicates an early agricultural wave from China into SE Asia, before the Tai-Kadai-Austronesian one? Darkblue is a bit more distant in fst from the other East Asian Neolithic components, perhaps due to having a larger degree of stabilised ancient local variation included in it.
Darkblue in Indonesians decreases with distance to SE Asia. It may have some association in this context with later expansion of populations- possibly associated with wet rice and other agricultural innovations from SE Asia- from more developed and more densely populated Java, a development continuing to this day.
4) Groups further towards the East present an increasing burgundy element modal in Papuan-speaking Melanesian Naasioi. These groups are swidden agriculturalists. Their relative success in resisting the farmer waves versus distantly related Negrito groups from the Philippines and SE Asia may hint at some primitive agriculture in coastal Papuan-speaking peoples prior to the Austronesian expansion.
Melanesians such as Naasioi probably have some Austronesian admixture, which may be invisible here due to lack of reference highland Papuan populations- the burgundy component may be a fusion of mostly "Papuan" with a little Austronesian- a possible reason why it may pull some of the Philippine Negrito variation in the more admixed tribes.
Polynesians also seem to have important Melanesian admixture.

sorry I didn't include East Asian participants in this run but the ~50000 SNPs in this set do not overlap much with the 23andme ones, so including them would reduce the resolution too much.

Many thanks to Zack from Harappa for drawing my attention to this set -he also wrote and posted the conversion code to bed format.

Saturday, 11 June 2011

Panasian-Part III: Philippines

Third part of the Panasian run (excluding Yoruba, South Asians).
This post is about populations from the Philippine Archipelago:

About the populations:
Amis and Atayal are Taiwanese aborigines speaking Formosan Austronesian languages. It is generally believed the Austronesian language family derives from Taiwan, having spread into the Philippines, Indonesia and Polynesia as an agriculturally-driven expansion. Taiwanese "Aboriginals" are historically primitive farmers supplementing their diet with some hunting and fishing.
Until recently they were the main inhabitants of Taiwan, having been swamped by Fujianese Han groups only in the last few centuries (Minnan and Hakka). There is some evidence they shared the island with a minority of Taiwanese Negritos- possibly the true Paleolithic forager native inhabitants- who don't exist anymore as a distinct group today.

Filipino populations are farmers, speaking various Austronesian dialects. Tagalog is the dialect on which the National Language of the Philippines, "Filipino" is based. They live in Southern Luzon. Ilocano is spoken by related populations in more Northern regions of Luzon island. Visayan is a third Filipino group of dialects spoken in the Visayas (central group of Philippine islands) as well as in some parts of Mindanao. The sample represented here was from Visayan-speaking colonists from West Mindanao. These groups comprise the majority of the islands population, and are at the core of self-identification with the modern "Filipino" nationality.

Minanubu or Manobo; Iraya are incipient farmers using slash and burn methods in Mindanao and Mindoro respectively. They speak Austronesian languages. These peoples have been suffering pressure from neighbouring more agriculturally advanced and socially complex Filipino migrants (such as Visayans), and are being pushed out of the more fertile soil in their homelands.

Ayta, Agta, Ati and Mamanwa are Philippine Negritos, who are generally hunter-gatherers, at least until very recently. Ayta and Agta are from Luzon; Ati are from the Visayas; Mamanwa from Mindanao.

The diversity of populations in the Philippines seems to fit very well into the Neolithic expansion model I've been exploring. Three groups can be identified, with a degree of continuity between groups: advanced farmer "Filipinos"; slash-and-burn farmers such as Iraya and Minanubu; and forager Negritos.
1) Negritos probably represent the ancient (forager) population of the Philippines. They present varying hybridization with farmers (as seen in farmer-associated components).
The similar pattern of "forager-components" (Kensiu are SE Asian Negritos. Naasioi are Papuans) in Agta, Ati and Ayta may represent Philippine-Negrito unique genetic patterns that ADMIXTURE didn't pick out in this analysis (maybe due to a more inbred Mamanwa sample taking it's own component and non-represented variety in the other Negritos being allocated to related Naasioi and Kensiu-modal components- something similar often happens in Siberian peoples' ADMIXTURE runs).
2) The earliest "First Wave" farmer intrusion in the Philippines is probably associated with Austronesian languages. Taiwanese Aboriginal groups speak the most divergent and diverse Austronesian tongues, and they are modal for the darkgreen component found in much larger amounts in Philippine farmers than in foragers (Negritos). Iraya and Minanubu have large such components but mostly lack the red, blue and lightgreen ones - they may represent a "First Wave" stage of Austronesian Expansion proper.
3) Han Fujianese and Cantonese migration into the Philippines is historically documented even before the Spanish Conquest. A "Second Wave" process may be interpreted as being in full swing in the Philippines in recent centuries and up to today. Slash-and-burn agriculturalists such as Iraya and Minanubu seem to be in the process of assimilation or expulsion to more marginal lands. These more primitive agriculturalists also present larger forager-modal components than "Filipinos", though less than Negritos, just as expected.
4) "Filipino" populations of advanced agriculturalists have more significative red and lightgreen elements, in which I tend to see absorption of much Han admixture in the last few centuries. Interestingly, mestizos de Sangley have been historically prominent as advanced farmers and plantation owners. Filipino ethnogenesis may derive from admixture between ancient slash-and-burn agriculturalists and migrants from China. Some admixture with Europeans (Spaniards, Americans) may also have occurred, to a smaller degree than with the Han, but a much higher degree than in other regional populations.

Wednesday, 8 June 2011

Panasian-Part II: SE Asia

This part II is about continental SouthEast Asian populations from the Panasian dataset K=9 analysis I ran earlier. Some useful populations for comparison from part I are represented again- Their ADMXITURE component percentages are exactly the same since this is the very same run.

About the populations:

Hmong live in Southern China, particularly in more mountainous regions often separated by Han-inhabited areas, suggesting they were once more widespread in China, but were possibly swamped by an agriculturally more advanced wave of Han in the lowlands. They also live in South East Asian countries, such as Thailand. Yao people likewise live in Southern China and SouthEast Asia (the Yao sample in this set is from Thailand). Both Hmong and Yao speak Hmong-Mien languages. These peoples in China are often lumped as "Miao" by the Han. Interestingly, Han Chinese foundational myths speak of the Miao as a people originally from the Yellow River, forced to migrate south after conflict with the Huaxia, another Yellow River people, from whom the Han claim to descend. Such stories may be just folklore, but they do resonate somewhat with a Neolithic model of East Asian population expansion. Interestingly, the Hmong-modal component (light green) is larger in populations from the Yellow River southwards, while the closely related red component is more important in Han Chinese and northeastwards from the Yellow River in Koreans and Japanese.

Wa or Va, Lawa, Blang or Plang, Paluang, Mal and Mon speak Austro-Asiatic tongues and live in pockets dispersed between Southern China, Southeast Asia, and Burma. In this set, the Wa sample's from China, the remaining ones from Thailand.
Vietnamese also belongs to the linguistic family. These languages are generally thought of as the original languages of SE Asia, being mostly replaced by Tai-Kadai and Austronesian languages today, but their geographical dispersion might also suggest expansion from a Chinese homeland, where they largely didn't survive.

Tai Yuan, Tai Khuen, Tai Lue, Tai Yong, Zhuang and Jiamao are Tai-Kadai speaking.
Tai-Kadai languages have some similarities to Austronesian tongues, suggesting possible common origin in Southern China- with one expanding West into continental SE Asia, the other East into Taiwan and insular SE Asia. Both would be largely become extinct in Southern China itself after the "Second Wave"-like Han expansion from the North.

Malays and Temuan speak Austronesian languages, as do the aboriginal Taiwanese Amis. SG Malays are from Singapore.

Jehai and Kensiu are Malaysian Negritos. They speak Austronesian languages as well, very likely adopted from their agriculturalist neighbours.

Considerations/speculation:
1) China Hmong appear to differ with Thailand Hmong only in some southern Han admixture in the former. This may also apply to Yao.
2) Temuan speak an Austronesian dialect apparently somewhat mutually intelligible with Malay. They preserve many ancient traditions maybe lost among the Malay, such as religious Animist practices. They also have a larger "pink" element modal in local Negrito foragers. Their presumed greater isolation may help explain less dark green genetic admixture than in Malays.
3) Jehai have substantial "dark blue" element absent in Kensiu. This may suggest an association of the dark blue element with agriculturalists (fst distances to other components are in agreement-see part I).
4) Some "Burgundy" component starts to become visible in Malays, unlike in more Northern populations.
5) The presence of an element clustering with West Eurasians in Mon and Malays is interesting. It's small, so it may be just noise. But since Indian populations weren't included and there's no "South Asian"-modal component, I wouldn't find it strange if this element would have a similar pattern to Fertile Crescent ones as found in South Asia- either from the time of the arrival of Neolithic West Asians to India, perhaps also later?
6) ADMIXTURE patterns at this K and language families have some correlation. A common origin in an ancient Yellow River-Yangtze River Nelithic Core Area can also be argued for linguistically.

Next I'll post results from this run for Insular Southeast Asia (Philippines, then Indonesia).

Monday, 6 June 2011

Panasian-Part I: Altaic and Sino-Tibetan

I've been away from ADMIXTURE for a couple of weeks, too busy with other stuff.
This time I've decided to tackle East Asia. I got access to a new dataset from the Pan Asian SNP Consortium, with some 50.000 snps. Sadly few of them appear to overlap with my current dataset, so fusing them together means I'd have to work with just a few thousand. I may try anyway in the future, but decided to play with the Panasian set on it's own for the time being. I didn't use the South Asian or Yoruba samples in this series to simplify things, but I did include White Utahns to check for possible West Eurasian (Fertile Crescent) influence.
I intend to apply some old tricks to get components to be more informative and less "isolated group"-tied, but firstly I wanted to see how this set would behave in an unsupervised ADMIXTURE analysis; namely I intend to check which unsupervised components are interesting and coherent with ethnographic/historical data so I can pick them for supervised analysis and hopefully gain some further insights.
I'm presenting in the next few days a series of regionally-split results.

Unsupervised results are good for inter-population comparisons. Most components likely don't represent any particular ancient populations. A certain amount of small component noise is expected also.
The following results are at K=9.

About the populations:
JapaneseML are from the mainland, as presumably are most "Japanese" without the ML qualification. They were separated in the set and I didn't fuse them. JapaneseRyukyu are from Okinawa.
SGChinese are Chinese from Singapore; BJG from Beijing.
Taiwan Hakka and Taiwan Minnan or Hoklo are Han Chinese, comprising the overwhelming majority of the island's current population. They represent the people generally meant when speaking of "Taiwanese" nowdays. They are however recent arrivals, presumably overwhelming the native aboriginals (Taiwan Ami and Atayal) with more efficient agricultural/social technologies only in the last 500 years or so.

TaiwanAmi and TaiwanAtayal are much older Taiwanese populations, but some discontinuity in the Paleolithic-Neolithic transition in Taiwan may imply an exogenous origin (possibly from early Neolithic China). They speak "proto-Austronesian" languages, and the Austronesian wave of language and agricultural lifestyle seems to have spread from there (or perhaps from Southeast mainland China, with a side branch going to Taiwan?).
The Austronesian Expansion seems to have been sort of a "First Wave" of agriculturalists (maybe secondary in some regions). Much later, advanced agriculturalist "second wave" Han Chinese then had again a major demographic effect, going beyond to other Austronesian lands as well, and apparent even in the Philippines today, with some 20% of the population having recent Chinese ancestry. Without Western interference, this possible Han "secondary wave" might have spread further still, given the large amounts of land then still occupied by foragers in insular South East Asia and Oceania. Indeed Negrito forager and semi-forager tribes are still under pressure today from their agricultural neighbours.

Jinuo and Karen are Burmese populations with Sino-Tibetan tongues (same family as Chinese languages such as Mandarin and Cantonese and also Tibetan. Mon speak an AustroAsiatic tongue but also live in Burma.

Mlabri, Mamanwa and Kensiu are all forager/semi-forager tribes. Mlabri live in South East Asia; Mamanwa are Philippine Negritos, while Kensiu are SouthEast Asian Negritos. Naasioi are Papuans.

Fst distances:

I'm not naming the components since I'm not sure they are historically informative.
It's interesting "Red" is very close to "DarkGreen" and LightGreen". The genetic distance is similar to that between closely related components from other Neolithic centres I've run before.
Some ancient stabilized admixture with very different local forager groups, present in these unsupervised components may even explain some of the distance, so the affinity may be higher than seen here.
On the other hand, "Forager-components" are much less similar both to the presumed Neolithic ones and with other forager components. Actually, since I strongly suspect these modern day foragers are hybridized with their agriculturalist neighbours, the distance between different "Negrito"-modal components may be even larger.
The distance between Westerner White Utahns and these groups seems to be roughly similar to the distance between foragers and agriculturalists, and different groups of foragers, but much larger than distance between agriculturalists. A possible explanation is multiple waves of forager-swamping agriculturalists from a single centre or group of related centres in the region.
Some minor forager admixture in farmers and major farmer admixture in foragers, would both be invisible to unsupervised ADMIXTURE if ancient in the absence of "pure" control groups.

I'll reserve more discussion for the supervised run, right now I'd venture to say the red, light green, dark green and blue components are all closely related, tend to exist in a gradient of admixture with one another in similar ethnic groups and may correspond to different but related Neolithic waves probably all from China. There is some interesting correlation with language groups.
Ryukyu Japanese may lack some components more important in mainland Japanese and Koreans due to greater geographic insulation from Chinese secondary Waves. Perhaps like Sardinians and Basques in the West.
Taiwanese Aborigines look promising as the representatives of a vast farmer demographic wave. Darkgreen presence in China may indicate it's origin there, since agriculture is older and was presumably more advanced in the continent.

In the next days I will post results for other Austronesian and Southeast Asian peoples. Then I'll do a restricted pole-supervised run.

Tuesday, 24 May 2011

Back to Africa- individuals

These are some of the individuals from the run.

(There is likely a lot of small component noise here, and also between NMPC, WMPC, EMPC due to their high similarity and small number of European individuals included)
Although African-Americans appear to present some variability in their African elements, their African ancestry can probably be best described as the product of a New World melting pot of originally Niger-Congo-speaking, West African ethnic groups. Affinity seems to be higher with ethnic groups from the Gulf of Guinea region, and the more Northern Bantu-speaking groups.
Their non-SubSaharan African ancestry is mostly composed of elements found in Europe as expected. There is very little East African ("Nilo-Saharan").
As for the Fulani it seems they have been somewhat isolated genetically from neighbouring populations, even though they live nearby non-Fulani peoples across a vast swath of the Sahel/West Africa region. Only a few individuals appear to be recently admixed to any degree. High levels of the Northwest African element, their general lack of any NCongo components except for NCongo1, together with NCongo1 being particularly high in presumably more isolated Sahel populations such as the Dogon all point to an ancient ethnogenesis for this group, perhaps in the (last) Green Sahara itself at the time of the West Asian Neolithic colonization of North Africa.

Bear in mind this run attempted to distinguish very closely related components and a high amount of "noise" is likely at the individual level.
Also Fertile Crescent components (FC) are being based in a small sample since this was about Africa. So NMPC here is different from my previous run, since it is based on the Lithuanians who have much more WMPC (Basques) than the Chuvash. EMPC was based on the Urkarah. So they don't have the same exact distributions as before.

Niger-Congo components are also very close and shouldn't be taken as exact at the individual level. "Nilo-Saharan" is not very distant from the NCongo components either, some people postulate a common origin for the language families, and thus perhaps some genetic similarity is expected. Indeed if both Neolithic expansions are related and come from the "southern shore" of the Sahara, most modern African populations may have derived from demographic and linguistical expansion, from the whereabouts of the desert only a few thousand of years ago, probably in association with a Neolithic Revolution associated with the Green Sahara.

Monday, 23 May 2011

Back to Africa- populations

I've been busy with unrelated stuff lately so didn't find time to post last week.
I've decided to take a new look into variation within African populations, using the restricted pole trick I used before in West Eurasian ones. I wanted to take a look at variation within the Niger-Congo or West African Neolithic Core variation in particular, by trying to split it up into 3 components.
I should point out I'm doing these runs seeking support for a theoretical model. It's hard if not impossible to provide definitive evidence (if such a thing exists at all) through ADMIXTURE results. But models which provide unexpected results or predictions which can then be tested by other methods or analysis of future data not available right now, have value in my opinion.

I've run a few unsupervised runs of the African samples available to me, compared with those in other projects, and used a few observations to guide me in pole choice. I was particularly interested in possible language-family-related components, since even though language groups don't always correlate exactly with genetics, there tends to be some relationship.
Restricted poles comprising only a few individuals are a good way of establishing one or more "centres" for a cline in the data. They don't "stick" however if there is no such cline, or if the cline extremes are not represented. For instance, I chose a "Palestinian5" pole using 5 Palestinians, yet it was immediately stolen by the Saudis, who present a similar and otherwise unrepresented component to Palestinians yet at a higher level.
However, for such clines as the Niger-Congo or "West African" cline, from Mandenka/Dogon to the South and East African Bantus, it is useful to establish some such "hand-picked centres" to differentiate subpopulations. The frontier between the 3 Niger-Congo components I've come up with is thus necessarily somewhat artificial. Choosing different "restricted poles" would result in different components to a much larger degree than in previous "Basque5" or "Lithuanian5" poles I've used, since these are much stabler.

These were the poles chosen:
1) Three Mandenka and two Dogon for a Northwestern Niger-Congo component. Dogon often appear as modal for the West-African component in unsupervised runs, indicating possible origin of the West African Neolithic in the Sahel. I named this component NCongo1 (Niger-Congo1)
2) Five Yoruba: NCongo2
3) One Pedi, two Fang, two Kongo and one Luhya to try to identify the Bantu component, which I named NCongo3 (Bantu)
4) Three Bulala and two Masai. Masai often claim their own component and at higher Ks even multiple ones, due to close family links between some of them. I removed some of these individuals, but in order to avoid such artefacts, I used Bulala trying to make the Masai pole more independent from the Masai themselves. This component turned out to be important in all Nilo-Saharan speaking populations (plus the Sandawe, and even these can be split away into their own pole) so I took the liberty of naming it so, which shouldn't be taken too seriously.
5) Five Biaka Pygmies (BPygmy)
6) Five Mbuti Pygmies (MPygmy)
7) Five Hadza
8) Five Namibian San- Khoisan pole
9) Five Palestinians. This pole was taken by the Saudi sample, as mentioned before. I named it FC(NEAfr). It seems to be important in Semitic language speaking populations, but it goes further into other populations as well. It roughly corresponds to the "second wave" or WMPC+NC component used in Fertile Crescent runs.
10) Two Mozabite, two Tunisians and 1 N Moroccan. This pole peaks in the Tunisian sample and was named FC(NWAfr)
11) Five Basques. Named FC(WMPC) - Fertile Crescent Western "Mesopotamian Core"
12) Five Lithuanians, named FC(NMPC)
13) Five individuals from Urkarah, Northern Caucasus. Named FC(EMPC)

I'm sorry for some repeated colours, Google graphs made the choices.
Populations are roughly distributed by language group in the graph.
1) Hausa is Afro-Asiatic from the Chadic group
2) Mandinka and Bambaran belong to the Mande group of Niger-Congo languages. Dogon languages may be related early offshoots of proto-Niger-Congo.
3) Brong, Yoruba and Igbo speak NC languages of the Atlantic-Congo group, the group which includes Bantu languages. Their languages are not Bantu though, and each belongs to a different subgroup distinct from the Bantoid one and from each other
4) Bamoun or Bamum speak an Atlantic-Congo language of the Bantoid subgroup but their language though related is not considered part of the more narrow Bantu-proper group.
5) Fang, Kongo, Luhya, Xhosa, Pedi, Nguni, Sotho, Tswana all speak Bantu-proper languages. These are a subgroup of Bantoid, itself a group within Atlantic-Congo, which is a major Niger Congo family.
6) Bulala or Bilala, Masai, Alur, Hema and Kaba all speak languages of the Nilo-Saharan family
7) Mada seem related genetically but speak an Afro-Asiatic language of the Chadic group

Considerations:
-NCongo1, NCongo2 and NCongo3 seem to exhibit clines as expected since these are more or less spurious components dividing a continuous cline of West African-like peoples from the Sahel to South Africa. Judging by fst to neighbouring forager components, they seem to have absorbed a bit of these elements into the components (eg NCongo3 seems to have absorbed a bit of San).
-There seems to be a remarkable concordance between component patterns and language families, including subgroups of such families. The model predicts such languages would expand in association with Neolithic peoples' colonization movements. Some discrepancies may be explained as later elite imposition of intrusive languages.
-Dogon languages seem to be the outgroup in the Niger-Congo languages, and they are the most distant to the Bantu genetically within the Niger-Congo group. They may represent an early offshoot, perhaps isolated since, of the original Neolithic Core/Revolution. In unsupervised runs, Dogon are often the "most West-African" of all these population samples.
-Nilo-Saharan-speaking populations share a common component, indicating spread of most Nilo-Saharan tongues not only by elites but more likely by another food-producing revolution. Bulala are from Chad, Masai live in Kenya. Both are pastoralist peoples.
- Alur seem to have some small but significant Mbuti Pygmy ancestry. These peoples live not far from each other and both speak Nilo-Saharan languages, probably adopted by the Pygmies in the last few thousand years in contact (and marginalization) with Alur-related peoples from the North.
-Similarly there are Biaka Pygmy segments in Bantus, and Biaka Pygmies today speak Bantu languages adopted from their agriculturalist Bantu neighbours.
- Both Chadic-speaking groups (Hausa and Mada) seem to have some "Nilo-Saharan" component. Hausa's Niger-Congo-speaking neighbours mostly lack it. Hausa may be a West-African population subjected to elite language-shift towards Afro-Asiatic after intrusion from AA-speaking herders from the East.
-It is possible the reverse may explain Luhya ethnogenesis, with an intrusion of Bantu agriculturalists into Nilo-Saharan pastoralist occupied land.
-Fertile Crescent ancestry of Masai and Ethiopians is mostly FC(NEAfr), or most like that of Saudis, which seems to be in agreements with linguistics. However these populations also seem to have some Northwest African influence (FC(NWAFr), just as Egypt but unlike in Saudis (remember the pole were actually 5 Palestinians). I think it's possible that at the time of the Semitic speaking migrations from Arabia, possibly some 2000-3000 years ago, there was already an older Egypt-derived Fertile Crescent element in the region, whose languages do not survive.
-Fertile Crescent elements found earlier in the Fulani seem to be wholly derived from North West African populations. Fulani speak a Niger-Congo language and have much affinity to other West African populations as well, so I'd say this admixture event is more likely very ancient.
-I don't know why FC(NMPC) elements appear in North African and Levantine populations here but not in my previous run. But I included very few European and Near Eastern populations in the analysis and these FC poles are very very close in comparison with the African ones, so it's possibly just due to noise and lack of definition due to few individuals with actual FC(NMPC) and WMPC. Also my last NMPC was based in the Chuvash (Lithuanians have much more affinity to Basques), but I didn't want to include Siberian poles in order to keep things simpler. This run is complicated enough as it is and small components may not represent anything much.
-North African populations may have an aboriginal substrate more complicated than I thought earlier, with possibly aboriginal NorthWest African, Green-Saharan refugee African and possibly other elements in addition to the West-Asian (FC) dominant element, so small segments may be representing such hidden elements and not actual admixture here I think. I'm still convinced elements with SubSaharan African affinity mostly represent aboriginal populations and are not the result of the caravan slave trade.

Tomorrow I'll present individual results.

Saturday, 14 May 2011

Restricted Pole Run: Part IV- experiments, conjectures

Due to Blogger problems I had to delay this last post.

This is a very experimental part of the run, not altering any other results. Also my interpretations here are quite speculative and I don't have very high confidence in them.

I found out a while back, that when including many individuals of similar background between themselves, but very different from the remaining samples, in one run, ADMIXTURE will pull one of the restricted poles towards the group, irrespective of any relation between actual pole individuals and this different population.
Thus if I had included all the South Asian data-set in my last run, one of the poles would simply become dominated by them, and would peak in the Irula. I didn't want to do that yet, since South Asia is a complex place genetically, and would only make results less clear. Still, I wanted some clues as to which Fertile Crescent elements made their way there.
By including just a few individuals from the area in the run, the pole-pulling problem can be avoided, and ADMIXTURE will instead try to fit them into the non-South Asian-dominated poles.
This means that some results are necessarily artificial for these samples. For instance no adequate pole for Ancestral South Indian (ASI) is present. Since ASI are somewhat "Asian" when compared to Fertile Crescent populations, I expected ASI elements to be mainly allocated to the Siberian poles.
So bear in mind in this run, "Siberian" in South Asians is mostly not actual Siberian or Turkic admixture. It is simply the least inadequate pole for the ASI element. It doesn't matter anyway for this experiment since what I really wanted to check was which Fertile Crescent elements were present -that is, which patterns are present in ANI.

So this part is highly experimental, but the additional individuals analysed here don't alter the remaining results appreciably (if removed, other individuals in the run still retain their admixture patterns).
In addition you may have noticed I didn't include an Amerindian pole in this run. I didn't for two reasons, firstly "Amerindian"-components in Europeans tend to be absorbed by the NMPC since they exist in mostly NMPC populations (including Chuvash used as restricted pole). Siberian tends to detach because many NMPC-rich populations don't have much Siberian, but the same can't be said for the "Amerindian" I found earlier, so they tend to get mixed up (except if using a FC pole without any of it such as the Egyptians).
The other reason was I wanted to check which poles would ADMIXTURE allocate to Amerindians themselves, if denied an exclusive pole for them. Amerindians are quite distinctive in PCA/MDS and in unsupervised runs. They cluster far away from Western populations, further away even than Siberians.
If "Amerindian"-like populations were present in Paleolithic Europe, we would expect them to be more "western" than their very "eastern" plotting position would imply -and namely more "westerly" than East Siberians.
But what if Amerindians are plotting in the "far-east" because for some reason they had a few highly distinctive genetic variants, but were otherwise not so distinctive. When denied their own poles, these distinctive variants wouldn't be allowed to pull them away. ADMIXTURE would be forced to allocate the remaining more conventional variability to conventional poles.
Two things might happen:
1) Amerindians would be allocated 100% to some Far-Eastern Siberian pole- which would support their plotting position being derived from their assumed Far-Eastern departing position into the New World.
2) Amerindians would be split into more conventional poles and their more "western" position, if abstracting from the few exotic elements, would be revealed. This would support western routes into the Americas, or perhaps a fast sprint after the end of the Ice Age, through recently ice-cleared far Northern Eurasia (mostly bypassing then more southeastern Siberian populations).

I thus introduced 5 unadmixed (no significant Spanish or European elements) Totonac individuals. As expected just 5 individuals weren't enough to pull the remaining poles towards them too much- they didn't get any Amerindian pole.

I actually expected Amerindians to come out as some Far-Eastern Siberian+Nganasan pole pattern. But this is what I actually got:

Siberian1 peaks in the Nganasan. Siberian2 (blue) in Yakuts and Mongolians. Siberian3 peaks in Far-East Siberians (Chukchis and Koryaks). Siberian3 is actually based on "Mongolian5", but "ran away" from them.

EBengal1 is Razib Khan from Gene Expression.
UKIND is British (explaining high WMPC) with some Indian.
I picked also 3 random Kalash, who I'm not sure are distinctive mostly because of inbreeding or long term isolation.
Naturally for these populations part of the admixture components is artificial. There is no high "Siberian" in Indians, but it is the "least inadequate" pole to represent ASI in this run.
As for the Totonac neither of the 3 poles is actually adequate since Amerindians are a highly distinctive population. I should point out that the NMPC in Totonac does not correspond to European elements, the Totonac sample is quite homogeneous, with very little such admixture.

EMPC is the predominant Fertile Crescent element in India. There is no other likely reason for ADMIXTURE not to pick the most adequate FC element from all such poles it had to choose. There is some NMPC as well. The lack of WMPC+NC in these populations, which is present in the steppe pastoralists (even in the Kyrgyzstani) points IMO to distinct migrations from similar origins. The colonization of the Steppe with the development of advanced pastoralist lifestyles seems to have occurred after the Second, Out-of-Egypt, wave. The colonization of India, departing from the same region (Iranian plateau, Caucausus, South Mesopotamia?) seems to have happened before the Egyptian wave, but possibly after the EMPC one. The earliest Northwest Indian Neolithic settlements are dated approximately about 6000BC which is in accordance with this possibility.
The representation of ASI-like segments variously by Siberian3, Siberian1 and Siberian1+ WMPC+NWAf may be related to ASI diversity among these populations. If South India was mostly settled, and even then with a high aboriginal persistence, only after the secondary EMPC wave developed (as opposed to a possible Northwestern settlement by a "primary" NMPC wave) this could point to a native incipient Neolithic, at least in South India.

One conjectural model:
1) An earlier less advanced expansion by a high NMPC containing population influencing only "easy" Fertile Crescent toolkit niches in the Northwest
2) Later an advanced secondary Neolithic expansion containing high EMPC from a developed Near Eastern Neolithic Centre, with much improved seeds and techniques finally making some way into Southern and Eastern India, while mostly replacing the earlier wave in the Northwest?
3) Maybe followed by a small reexpansion of the Northern element from the periphery (now mostly EMPC but still with more NMPC than Southerners?)
I'm not sure why ADMIXTURE didn't find WMPC+NC small elements apparently typical of Central Asian populations here. But a possibility is that Central Asian demographic influence in India is overestimated in other models.

About the Totonac results. Indians had an "Eastern element" (ASI) that had to be assigned to an Eastern pole (Siberian poles). South Indians seem to have a more "Southerly" ASI variant which was perhaps artificially allocated to the MPC+NWAf pole. This can be seen in PCA plots.
Amerindians on the other hand are "far eastern" even relative to Siberians. There could be a number of explanations for this, but I think it's interesting ADMIXTURE chose to represent them with Siberian3+NMPC+Siberian2. These don't correspond to actual admixture events (much like "Chinese Mexicans"). They could be partly due to much "Amerindian"-like admixture in North Europeans being allocated to the NMPC pole (since I didn't include an Amerindian pole and high NMPC populations all have residual "Amerindian"-like elements)-making it slightly more "Amerindian"-like than some other poles available.

This is pure speculation but it's as if ADMIXTURE, when forced to ignore some possible Amerindian exotic elements (due to having to pick exclusively from among pole populations without as much of them), is telling us that Amerindians are otherwise "more Western" than they seem to be in the PCA plots.
It was already strange that "Chinese Mexicans" had a smaller "Chinese" component than a Totonac one. Greater proximity between Chinese and Europeans in PCA plots would imply that a Chinese pole would tend to overestimate the Amerindian element not the reverse. East African overestimated the African component in African-Americans as predicted.

So summarily: Totonac obviously don't have any real NMPC. Possibly neither much of the Siberian admixture they seem to have. ADMIXTURE component patterns simply "plot" the Totonac's position relative to the poles available while excluding elements not present in any of the respective aggregated components.
They have some affinity to NMPC only because they're denied their proper poles in this run. This is I think because NMPC has some slight affinity (having "absorbed" them in this run) to possible "Amerindian"-like variants in North Europeans. +Totonac being more "Western" than they seem as long as a few conjectural exotic small elements are forcefully ignored by the run set-up.

Here is the participant's spreadsheet. Full run spreadsheet.
You may also be interested in checking out an interesting 3D PCA model at Harappa.

Inland Ocean. Restricted Pole Run: Part III

This is the third part of the restricted pole run of Western Eurasia. All results are from the same set-up and analysis. This part concerns Central Asian and Siberian populations. Central Asia is a sparsely populated Steppe expanse connecting all major Eurasian population centres: West Eurasia, East Eurasia, and South Eurasia, and these with remaining Siberian populations to the North (today mostly replaced by Russians).
Like finding ancient bones is tropical areas, finding ancient population genetic fossils in Steppe populations is probably more difficult. Unlike settled agriculturalist populations, who I think present much more continuity since the Neolithic Revolutions, there have likely been major changes in the sparsely populated, pastoralist inhabited, sand and grass oceans of the Eurasian interior.
For instance historically it can be presumed that such populations were more "West Eurasian" thousands of years ago than they are now, since a North or East Asian component has become important or even predominant. Still, looking at the Fertile Crescent-related components across the populations, we can perhaps get hints about early Western Neolithic influence. Component proportions preserved across all groups likely were derived from founding populations. Those exhibiting clines, maybe more likely introduced more recently.

For more extended interpretations of the components please read my previous two posts.
In retrospect, I think using the "Chuvash5" as the NMPC pole may have underestimated it and overestimated a bit the "Basque5" one. Next time I may use a combination of Chuvash and Lithuanians and see what happens.

Regarding the above results, some considerations:
1) Uzbekistan Jews seem to have, like most Jewish populations, a predominant Levantine element. They give better contrast to patterns consistent between the other Central Asian populations.
2) All Central Asian populations have large Yakut and Mongolian-like components corresponding to a likely Turkic and Mongolic element.
3) All Central Asian populations also have a "Fertile-Crescent" element composed of EMPC+NMPC+2nd Wave in that order of importance. This element is quite similar to that of the Caucasus populations. These last don't have much Turkic/Mongolic. A good model explaining these patterns is that the Central Asia steppe (as opposed to the river valleys of the Ukraine and South Russia) was initially populated by a Fertile Crescent element coming from the Caucasus/Iranian plateau at a relatively later date.
4) The Turkic/Mongolic component varies widely in a cline between them. Removing it, allows for a much better view of Fertile Crescent element patterns:

CAsia is a participant, mostly Kazakh in origin I believe. Hazara may have more EMPC due to admixture with Iranian plateau populations. PonticCaspian, a participant represented in the main graph above, with Moldavian Gagauz and some Ossetian ancestry presents much the same pattern except with higher WMPC (possibly due to proximity to Balkan/West Caucasus populations). Much of the same pattern can be observed in Altaians and Buryats, in a much smaller FC element.
I really don't see population changes after the initial invention/introduction of advanced pastoralism producing such an homogeneous pattern (bear in mind some of these are small components, indeed all of these in the Mongolian case are small, since they're mostly an Eastern population and can be overinflated by the exclusion of major "Siberian" ancestry). Certainly not from Mongolia to the far west. These patterns I think most likely roughly correspond to the original Kurgan pastoralist people, and possibly also to the ancient Tocharian peoples.
5) Northeast Europeans have much NMPC but little EMPC. This "Chuvash5" element could have come from the same region at an earlier time, before or in the beginning of the EMPC expansion/ intrusion. Some small EMPC elements in Northeast European populations may indicate that the expansion Northwards into more marginal lands of NMPC was driven by competition/conflicts with the EMPC intruders.
6) Another explanation for EMPC in Northeast Europeans is a secondary expansion of Steppe peoples into the region, after the NMPC primary expansion.
- WMPC+NMPC in Koryaks, Chukchis and other Siberians should correspond to recent Russian admixture. WMPC may be overestimated in this run, but proportions appear similar in Russians in my previous post.
7) Caucasus mountain valley populations appear to have preserved various demographic "pictures" of past admixture patterns (much like Basques and Sardinians), pointing to demographic changes in the Caucasus-North Fertile Crescent region: firstly mostly NMPC agriculturalists expanding into the rivers of the Ukraine and South Russia, as seen in Lithuanians (and Chuvash); then affected by the EMPC expansion as seen in the Urkarah; later affected by the "2nd wave" (WMPC+NC) as seen in Lezgins and Stalskoe. It seems it was at this last stage that pastoralist populations emerged into the steppes, otherwise it is difficult to explain the remarkable consistency in EMPC vs NMPC proportions in all pastoralist populations sampled.

So the current model I think most likely:
-NMPC was primarily an early (pre-EMPC expansion or simultaneous to it) agriculturalist expansion into the river valleys of South Russia and the Ukraine, likely not affecting the remaining marginal steppe to a large extent.
-Populations from the Caucasus and Iranian plateau were then heavily affected first by the EMPC and then to a smaller extent by the subsequent 2nd wave (WMPC+NC) secondary expansions. These had "higher drag" and didn't affect more northern NMPC populations much (possibly also due to much colder climate rendering secondary wave innovations less applicable).
-At some point not long after the WMPC+Nile Core expansion (so around 3000BC or so), people from the Caucasus and/or Iranian plateau expand into the Steppe with developed pastoralist lifestyles, probably identifiable archaeologically with the Kurgan culture.
-All these steppe populations are subsequently affected, in a more clinal thus more recent way, by Siberian/East Asian elements, corresponding to Turkic/Mongolic expansions.

I thought this would be the last post from this run, but I've decided to leave some South Asian and other individuals I included in the run (just a few samples, results without them aren't appreciably different) for later, since otherwise it's too much for just one post. I'll present the spreadsheet and individual participants then, possibly today if I can find the time.

Wednesday, 11 May 2011

Old New World. Restricted pole run: Part II

The European results of my "restricted pole" supervised run of the Fertile Crescent area follow. This is the exact same run as this one.
I didn't include most individual participants results here (only regional averages and a few isolated representatives of populations not otherwise sampled) since they're too many to post with every run. I'll post the spreadsheet with Part III (Central Asia, Siberians and a few others), including all participant's results.

I have a few warnings to readers not familiar with the variability of ADMIXTURE results:
1) ADMIXTURE is a bit stretched figuring out patterns representing components as close as these, particularly over 30.000 or so SNPs. So small components aren't very reliable.
2) This run includes lots of extra populations and extra poles to my last "Basque5" vs "Chuvash5" one, so components, though related to those and named similarly, aren't the same and will differ.
3) In particular, the WMPC, WMPC+NC and WMPC+NWAf seem to vary at each other's expense a bit. WMPC+NC ended up being too much drawn to the Druze, and WMPC+NWAf too much to Tunisians. These samples may have multiple distant family links within them. I decided to keep these populations for the time being, but may remove them in a future run. So for some small elements in some populations, one of these components may be standing up for another one.
4) Any ancestry from regions not represented in the poles will tend to be pulled towards the "least inappropriate" pole. For example if some individuals have some East Asian ancestry it may appear as a Siberian segment.
5) My interpretations are obviously just conjecture, sometimes better argued than others.

You can read more extended interpretations of components in my previous post.
Summarily:
-WMPC: "West Mesopotamian Core", referred to before as "Western Wave". First wheat planting Neolithic colonization of the Mediterranean and Western Europe. I think they came from the Levant and Anatolia. I think it may correspond to Megalithic and Danubian archaeological horizons. In colder climes, this wave probably only occupied wheat-congenial regions, leaving less adequate ones to foragers. They're best represented today in Basques, Sardinians, and also at a lower lever in Western Europeans in general. Possibly with high R1b (and I?) Y-haplogroups
-NMPC: referred to before as the "eastern wave". Cold, poor-soil-adapted first Neolithic wave, maybe due to innovations such as winter-rye. Expanded North into the Steppe rivers from a homeland possibly in the Eastern Iranian Plateau, Caucasus range or Northern Mesopotamia. Later, after adaptation, it would have spread throughout cold and sandy soils in all of Europe, especially in the North, bypassing rich agricultural areas already inhabited at high densities by WMPC people. Possibly with high R1a Y-haplogroup levels. mit-haplogroups introduced by this wave into WMPC areas might explain why there's a "mit-DNA gap" between Danubian remains and modern Central Europeans. I would tend to identify the beginnings of NMPC expansion into Central and Western Europe with the Corded Ware culture. This may have been also the "melting pot" from where East Slavs began their long expansion towards the Pacific.
-WMPC+NC: synthesis of WMPC early intruders into Egypt and local Nile Core elements. Referred before as "Second Wave".
-WMPC+NWAfr: synthesis of Green-Sahara derived native North West Africans and WMPC. I think the expansion of this element into Iberia and beyond may have happen very early on (much earlier than the second wave in the East), at the time of initial WMPC colonization of the region via the Northern Mediterranean route.
-EMPC: East Mesopotamian Core. Patterns of NMPC and WMPC suggest to me this is a local, more eastern element that underwent expansion into the NMPC and WMPC homelands in ancient times, before the second wave from Egypt. I think a model of Neolithic developments generating higher surpluses and elites/specialists generally should coincide with a demographic expansion from the same region. That is any ancient people developing agricultural productivity high enough to enable them to live at much higher densities, and thus partially swamp out neighbouring related already agricultural peoples, must have produced enough food surpluses to allow relatively much larger elites/specialists and better social organization. Such secondary wave origin points should thus be identifiable archaeologically. The "Second Wave" I have identified with Egyptian Civilization (which begins at around 6000-5000 years ago, at the same time according to some studies as the early proto-Semitic expansion). Based also on ADMIXTURE patterns, I would tentatively relate the EMPC expansion with a Southern Mesopotamian homeland.
-All these components, except for the Siberian ones, derive I think from the ancient Near East. The Siberian ones correspond to perhaps ancient traces of European hunter-gatherers. I didn't include an Amerindian pole this time, since with multiple MPC components they tend to be identified and subsumed into the NMPC I think (since NMPC and forager residual segments tend to exist in the same populations-possibly due to NMPC late occupation of much of the colder, less fertile soil niche). More on that later.

Monday, 9 May 2011

Western, Eastern, Northern, Southern: Motherland?

In this post I'll tackle the last unsupervised component: the West Asian-modal component, which peaks in Georgians and exists in high amounts in all West Asians. This component appears much throughout Europe, and strangely, in unsupervised runs it is a bit closer by Fst distances to the Lithuanian-modal component than it is to the Sardinian or Basque-modal ones. What is going on here?

To recapitulate my previous results/interpretations:
1. Basque and Sardinian-modal. In MDS and PCA plots, Sardinians and Basques plot not far from each other, but in distinct clusters out of the European "mainstream". Sardinians further towards modern Near Eastern populations, Basques more towards modern North European populations. In supervised admixture runs, both populations seem to be dominated by the same element, with Sardinians having a smaller element with North African/Near Eastern affinities, and Basques mostly without this element but with another element predominant in Northeastern Europe. I interpreted these results as indicating "2nd wave" influence on Sardinians, and "Eastern Wave" influence on Basques, exclusively superimposed on a common "Western Wave" element. This element, also found in large percentages in Southern, Northwestern and Central Europeans, but lacking in Northeastern ones, would correspond to the first Wheat-planting Neolithic expansion into Europe. Less fertile/colder areas not congenial to Wheat would have been moslty left to remaining forager populations providing meat and fish in exchange for wheat. Indeed such an arrangement is documented in several places in the World, and is still found in some Southeast Asian regions, where demographically dominant agriculturalists in more fertile/rice adapted soils exchange agricultural products with protein from foragers in marginal lands.
2. Lithuanian-modal, also present mixed with Siberian elements in the Chuvash, seems to be prevalent in cold/poorer soil areas of Europe. I think it may correspond to an Eastern Neolithic Wave from the Northern Near East through the Caucasus and Steppe into Eastern Europe. After adoption of cold-adapted agricultural techs, such as winter-rye, it expanded into the vast niches left mostly unoccupied by the earlier Western Mediterranean-Atlantic and Danubian waves. It replaced the forager populations still present in those marginal lands. An analogy is to the settlement of Japan by the Yayoi people. Only after developing cold-adapted rice varieties and techniques were they able to perhaps migrate and completely replace the Ainu-like foragers in rich coastal foraging environments less suited to earlier Neolithic lifestyles.
3. Mozabite/Tunisian and Bedouin/Saudi/Egyptian components are mostly about peaks of NW African and NE African ("NileCore") components. The Green Sahara may have had semi or full pastoralist developments explaining the presence of such components, admixed with "Mesopotamian Core" ones in such a high degree in some nomadic populations.
4. "Siberian" and "Amerindian"-like small elements in Finns, Russians, Scandinavians, Balts and Irish, British are I think not derived from any undocumented migration of ancient Siberian peoples but genetic traces of Native European populations. I've tentatively interpreted "Amerindian"-like segments in North Europeans and people from the Caucasus Mountains (which also appear in unsupervised runs) as survivals of an old "Amerindian"-like aboriginal population, making it's way to the Americas through the Ice Age ice cap with a lifestyle similar to that of Inuit/Eskimos today.

So the last component still eluding explanation is the "West Asian" one. Why is this closer to North European than to any other unsupervised mode component?
I've stated before, I think this component is a Mesopotamian Inner Core element, very similar to the European Western and Eastern Wave ones. Indeed these last may be "frontier" elements with less diversity, subsets of the Inner Core one. This would explain why Near Easterner and Southeast European populations tend to have large balanced "Basque5" and "Chuvash5" elements in previous runs. So the Inner Core element may be the "Mother" of these less diverse frontier elements that expanded into Europe. Later, the Inner Core populations, perhaps in Mesopotamia, would achieve more evolved Neolithic capabilities and expand into the Western and Eastern Wave settled regions.

So "West Asian" peaks in Northern Near East populations, such as Armenians, Iranians, Georgians, however using these populations as the pole is problematic since these same populations have signatures of "Nile Core" influence. It however also appear in high levels in Northern Caucasus populations such as Lezgins, Adyghei, and in the samples from the Mountain Dagestan towns of Urkarah and Stalskoe.
Using my "restricted pole" supervised run trick, I used first Georgians as the pole. As expected this erased much "2nd Wave" influence from European populations, since the "Georgian5" pole attracted many such segments together with the North African one. Using "Urkarah5" I got higher percentages in Northern Europe than seemed reasonable and since in previous unsupervised runs, Lezgins, Urkarah and Stalskoe seem to have some "Northern European" in addition to predominant "West Asian" I thought it would be best to combine some Georgian, Iranian and Urkarah individuals to allow the pole better to "focus" on the element I was searching for.
Naturally this is discussible, but results using any of the poles are not very different.
I also decided to ditch Mozabites and Egyptians and replace them by Tunisians and Palestinians as NW African and NE African (Nile Core) poles respectively. Mozabites and Egyptians are more southern populations and any influence from North Africa in European populations is already represented in Tunisians and Palestinians. I excluded all other North African populations in order not to complicate things further, except for predominantly agricultural Northern Moroccans, which I wanted to use to pull the "Tunisian5" pole into the region (in order to find influences in South Western Europe).

I'm dividing this run into three parts: Near East; Europe; and Siberia+Central Asia. They are all part of the same run. First I'll present results for the Near East. I included some populations in Near East and European posts to link them, since it's all from the same analysis.

Poles used:
1. Siberian1: 5 random Yakut individuals "Yakut5" restricted pole. This captured most of the Siberian Turkic and Mongolic elements. It peaks in Buryats, with Yakuts and Mongols not far behind.
2. WMPC: "Basque5". This pole captured an element which peaks in Basques and Sardinians, but is also present in Southern and Northwestern Europe in important amounts. Somewhat surprisingly, it is also important in the Levant (unlike the other element present in Northwest Europe, "Chuvash5"). This is the Neolithic first Western Wave, perhaps associated with a Mediterranean-Atlantic migration route and its Megalithic monuments, depending on river-valley wheat agriculture. It also presumably represents here the probably very closely related Danubian wave.
3. NMPC: "Chuvash5". I renamed it Northern MPC since it seems to be Northern relative to the Fertile Crescent area. It peaks in Europe in the North East, and I've called it "Eastern Wave" before. Maybe a cold and poor soil-adapted Neolithic Wave, and rye agriculture (also perhaps occupying much of the "wheat niche" in some parts of Eastern Europe).
4. WMPC+NC. West MPC with some Nile Core, also referred before as "2nd Wave", or out-of-Egypt. Perhaps associated with proto-Semitic (and Exodus tales). A late expansion from perhaps 6000-5000 years ago. It was based on "Palestinian5" however it came to be dominated by the Druze, perhaps due to multiple family connections in this somewhat isolated population. It doesn't matter though, since the Druze are still adequate representatives even as a modal population. The Palestinians, Jordanians and others had their higher Nile Core component taken over by the Tunisian pole. I could remove the Druze, but things wouldn't change much and I prefer to keep all populations at this point. French Basques also seemed "inbred" before, but I'm now convinced their "isolated population pole-pulling" tendency is mostly due to their unique Neolithic wave mix.
The second wave element is closer to the "Basque5" element than to any other MPC element. I think the native element of the Levant may be closely related to the "Basque5" element, and this would be the subset expanding into less advanced incipiently Neolithic, Nile Core dominated Egypt, synthesize with it, and reexpand into its homeland and beyond as the "2nd Wave". This would be a more advanced Wheat-planting expansion, and the probably much larger food surpluses making it possible also allowed higher organization levels, specialization, and elite forming in Egypt itself- a process leading to ancient Egyptian Civilization.
I seem to recall a R1b Y-haplogroup found in ancient Egyptian remains-perhaps it was native after all since ancient Egyptians might have been a synthesis of WMPC and NC?
5. EMPC: based on two Urkarah individuals+1 Iranian+2 Georgians. If based on "Urkarah5" or "Gorgian5" it has a greater tendency to draw too many non-pole individuals of said population and any 2nd Wave or NMPC elements also present there. By combining the restricted pole, this problem can be reduced. I think EMPC is an inner-core MPC element perhaps present in Mesopotamia itself. It seems to have expanded against the WMPC and NMPC within the Near East after these lasts' expansion outside of it, but likely well before the second wave. Also seems to be the subset corresponding to Ancestral North Indians.
6. WMPC+NWAf: based on "Tunisia5". This component likely consists of mostly WMPCA+some Green Saharan derived NWAfrican I found before. Tunisians were drawn at almost 100% into the pole, unlike Northern Moroccans, probably due to multiple distant family relations within the Tunisian sample. Still matters little, since it is representative of Western North Africans. Since I believe Tunisians are still representative of ancient Berber-speaking Neolithic populations I removed Mozabites. Using Tunisians may allow more sensitivity, since Mozabites are more distant.
7. Siberian1. "Nganasan5" using only unadmixed Nganasan. Representing a Western Siberian element.
8. Siberian3. "Mongolian5". Strangely this pole didn't pick up much in other Mongolians, but instead focused on Chukchis and Koryaks from the Far East. Mongolians turned out mostly "Siberian2". This happened presumably due to closeness of "Yakut5" and "Mongolian5" (just as before I used "Dogon5" to capture NWAfrican with the help of another West African pole). This happens frequently in "restricted pole" supervised runs, for instance in Northern Europe runs even a "Orcadian5" versus "Hungarian5" pole analysis reproduces imperfectly the "Basque5" vs "Chuvash5" scenario, since "Orcadian5" and "Hungarian5" shift away from their individuals' populations and peak in Basques/Sardinians and Lithuanians/Chuvash instead. So restricted pole runs in my opinion tend to shift towards real patterns in the data.

This is still an imperfect run, don't read too much from smaller components. Still results are similar using various different set-ups (and not too different from unsupervised ones).

Some speculative considerations:
-WMPC is present in the Levant and North Africa, whereas NMPC is not. It is also dominant over it in Turks, Armenians, different Jewish groups. This suggests that WMPC ("Basque5") has its ultimate origin here, in the Levant and Anatolia. It seems to have expanded not only into the Mediterranean and Balkans, but also into North Africa. I think the 2nd Wave likely is just WMPC admixed with some North Eastern African ("Nile Core").
-NMPC is dominant over WMPC in Iranians, Kurds, North Caucasus and Eastern Caucasus. I tend to think it derives from an ancient population living in the Northern Mesopotamian/Caucasus range/Western Iranian plateau region.
-EMPC may correspond from an expansion into both the WMPC homeland (Levant/Anatolia) and NMPC homeland (Northern Mesopotamia?/Eastern Anatolia?/Western Iran?) from an inner area, perhaps Southern Mesopotamia.

I'll present European results later.

Thursday, 5 May 2011

Individual Results for "Basque5"

I've decided to post the "Basque5" individual data after all. I think it's a good departing point and it helps to understand why "Chuvash5" (I'm interpreting it as the cold adapted "Rye"-farmer wave component) or "Basque5" (the warmer river valley dweller, earlier "Wheat"-farmer wave) percentages in one individual may vary quite a bit between runs. ADMIXTURE can give quite different results under different set-ups. This is not a reason for ignoring them, but in my opinion another source of very valuable information. If you keep in mind the framework, some result variability is sometimes enlightening.
"Mesopotamian Core" elements in particular tend to grow and decrease a bit at each other's expense in different runs. This happens due to their great similarity, sharing a common source (possibly in ancient Anatolia).
I want to reiterate that this is a first experimental run, and smaller percentages in an individual have high likelihood of being just noise, and even small percentages in populations are merely indicative.
Also forager elements may be eaten up by other components, or may be representing more exotic admixture in some cases as well (South Asian, East Asian may appear as "Siberian in a few cases).
In addition, I included participants with only European/ME ancestry, but 2nd wave may be eating up any very small non Nile Core African elements in some New World, and perhaps other, individuals, for instance.
Particularly in the "Sardinian5" run, the "Chuvash5" element was I think overestimated in several populations, due to a little deviation of this component towards the what I've been calling the "2nd Wave" element, which Sardinians have in significant amounts.
So this is a very imperfect run. It is self-admittedly very chunky and will be improved in the future. Please don't assume for now any 1-2% of anything is actually something.

AJ2 is Dan Vorhaus from Genomes Unzipped.
Some considerations:
SWFrance is from Gascony and shows large "Basque5" as expected.
Germany1 has much Rhineland ancestry, and also has much "Basque5".
Southern Germany, Switzerland, Slovenia and Hungary appear to have more significative "2nd Wave" than more northern populations, which I assume would be present in neighbouring populations as well. Raetic, a relative of Etruscan, was spoken in the region in pre-Roman times.
PonticCaspian has mostly southern steppe and whereabouts origins. He has very high levels of "Chuvash5" relative to "Basque5".
Some Americans have Southern European ancestry.
Balanced "Chuvash5"+"Basque5" elements in Southeasterner populations closer to the Fertile Crescent may correspond to an "Inner-Core" component including diversity present in both "outer-core" components and thus not the product of West Wave/East Wave admixture.

I'll post more, better results in the near future.

Tuesday, 3 May 2011

Irula and Basques, Sardinians, Lithuanians?

I have stated before that I think unsupervised results of European populations are based on modal populations that are admixed themselves.

Just as the "Irula" component tends to pick up most of South Asian diversity in an artificial manner contradicting
some recent research (Reich et al -see also Dienekes); and different supervised methods seem to confirm it, so I believe the same can be done to European populations.

The Irula are modal for the South Asian component for a simple reason. All Indian populations are a mix of Fertile Crescent incomers (Ancestral North Indian-ANI) and older established populations (Ancestral South Indian-ASI). Yet even though we have extant relatively unadmixed Fertile Crescent populations (ANI), we do not have any extant ASI populations. So Irula, in whom ASI peaks among public genotyped populations appear as if almost 100% "South Asian", even though they are really roughly 1/2 ANI- 1/2 ASI. Most Indian populations tend to be dominated by this element as well, the remaining part being more conventional Fertile Crescent components. So the "South Asian" component is really ASI+ANI and doesn't correspond to either of them in any "unadmixed" form.

Zack has done interesting work using the Onge of the Andamans as a proxy for them, and I have run a simple analysis using Papuans. Neither really correspond to the actual ASI.

In Europe and nearby regions thus we similarly get Lithuanian-modal, Sardinian-modal, Basque-modal, and West-Asian-modal components. Mozabite modal and Southwest Asian-modal populations I've previously analysed in a way that suggests they derive from peaks in Mozabites and Bedouin of "NW African" and "Nile Core" respectively.

So I'll be trying to do the same to European populations in the next few days, and reveal unsupervised results for what they probably mostly are: clines, in which ADMIXTURE picks the peak population as the modal one. Naturally revealing intra-"Mesopotamian Core" clines is much harder and inexact than clines between less closely related populations.

Here are the first runs. I've used my trick of using "restricted" poles in order to get more accurate estimates, but these are experimental runs, and the thing to note is what remains similar in either. I plan to develop better ways to fish out these components soon, so don't take these first results too literally. I've used European populations, excluding most Middle Easterners since I think "inner-Core" Mesopotamian subsets different from the "frontier" ones may be hidding there and don't want to make the run more complex than it already is. For the same reason I excluded Cypriots. I did include a handful of participant samples from the ME and Southeast Europe: being few, they won't "demand" their own component and confuse results, yet can give me some clues as to how to proceed when expanding the set.

Firstly "Basque5" versus "Chuvash5" versus 5 Egyptians+ 5 Mozabites. Bear in mind having an element doesn't mean ancestry from one of these modern groups, only that portions of DNA tend to cluster together with these available poles.
Admixture proportions for Chuvash and Basque obviously exclude the 5 pole individuals in each.

FST genetic distance estimates by ADMIXTURE
"Basque5" to "Chuvash5" 0.035
"2nd Wave" to "Basque5": 0.048
"2nd Wave to "Chuvash5": 0.040
Siberian to "Basque5": 0.133
Siberian to "Chuvash5": 0.104
Siberian to 2nd Wave: 0.120
Amerindian to "Basque5" 0.231
Amerindian to "Chuvash5" 0.194
Amerindian to 2nd Wave: 0.216
Amerindian to Siberian: 0.164

As fst shows, "2nd wave" is much more of a "Mesopotamian Core" element than a Northeast African element here. It contains a measure of the Nile Core-admixture found in Europeans though. Since the Egyptian expansion itself I think had more Mesopotamian than Nile Core, this is not surprising. The smaller presumably forager components are merely indicative since this run is too rough and they might be partially subsumed or be subsumed by bits of other elements.

Now with "Basque5" substituted for "Sardinian5", remaining poles being the same. Sardinian population averages presented exclude the 5 Sardinian samples used as pole.

FST
"Sardinian5" to "Chuvash5" 0.035
"2nd Wave" to "Sardinian5": 0.049
"2nd Wave to "Chuvash5": 0.045
Siberian to "Sardinian5": 0.128
Siberian to "Chuvash5": 0.111
Siberian to 2nd Wave: 0.122
Amerindian to "Sardinian5" 0.225
Amerindian to "Chuvash5" 0.203
Amerindian to 2nd Wave: 0.218
Amerindian to Siberian: 0.163
Notice that the largest distances are not similar so the smaller ones can't be exactly compared between different runs. I think "2nd wave" is slightly more concentrated in this last run, but still mostly Mesopotamian.

Balanced "Chuvash5" + "Basque5" in more "inner-core" influenced populations such as Assyrians and South Italians simply means that their "Mesopotamian" is simply the more diverse parent of both "outer-core" "Basque5" and "Chuvash5", who are perhaps West-Anatolian and NorthEast Anatolian particular subsets of it. I now see some evidence for a secondary "inner-core" expansion before the second wave. More on that later.
I think these results are a bit inexact, but general components seem to hold in runs with other quite different European poles (tomorrow I may present some of these). They also appear in a very confused and mixed way in unsupervised results. So they very likely represent something real. Yet these are obviously preliminary results. I'm not sure if Lithuanians are so much Eastern Wave-derived as these particular results seem to imply, although I now think my early estimate of "Basque Admixture" using the full-set supervised Basque and Chuvash poles overstimated the "Basque" or Western Wave element there (Basques seem to have quite a bit of the Eastern element themselves!). Still "restricted poles" seem to estimate actual components better.

So firstly it is interesting that Sardinians do not have the Eastern Wave element in either run. This is as expected, if "Chuvash5" came from the Northeast (perhaps more remotely from the Steppe, Caucasus and originally Northeastern Anatolia in that order).
Sardinians do have, however a large Second Wave element mostly absent in Basques. And Basques, even in the "Basque5" run, seem to have quite a bit of this "Chuvash5" component.

Northern Europeans seem to have more "Chuvash5" than the Basque, particularly towards the East; and some small Second Wave element in some regions.

This seems an adequate explanation for unsupervised ADMIXTURE runs' results. Sardinians are modal for their "Sardinian" component since they lack the "Chuvash5" and Basques because they lack "Second Wave" yet have much more West Wave than most. Lithuanians are modal also for their component since they have both little Second Wave and "Basque5". And West Asians like the Assyrian Christians appear to have some "Chuvash5" (more on this latter) but much more Second Wave. All other European peoples are intermediate between these 4 "extreme" extant populations. Thus they can be adequately reconstructed by unsupervised ADMIXTURE using components modal to the Basques, Sardinians, Lithuanians and West Asians. Unsupervised ADMIXTURE has no way to know if actual Sardinians, Basques, Lithuanians, West Asians actually settled Europe. It assumes it was so since it was meant, I think, to determine admixture proportions for populations whose parent populations still exist (like Mexicans and African-Americans).

Faced with such a scenario in which unadmixed parent populations are mostly not there anymore, it picks the most extreme populations and uses these admixed populations as if they were the parent ones, generating historically illogical results.

Looking back to the results. Why do Basques have this Eastern element at the 10-20% range? Why did it spread to Spain if it arrived AFTER the "Basque" or western element (archaeology strongly supports a model of agricultural spread from the South towards the Northeast)? And why do Middle Easterners and Southeast Europeans have both even though they're likely source? A few points:

From it's smallish presence in Basques, I think it's clear this is an later intrusive element to an already agricultural population. I don't think these farmers were the carriers of Indo-European languages. Maybe they spoke distantly related languages. One argument concerns Y-haplogroups. R1a is prevalent, R1b mostly absent, in populations with very high "Chuvash5" in the run. R1b is dominant in Basques, but also common in Sardinians, but R1a absent. Languages spread with elites, but these people weren't passing much of their Y-Chromossomes to their children, at least West of Central Europe, and yet West Europeans appear to have plenty of their genes. Some discrepancies between mit-DNA Danubian Neolithic remains' and modern Europeans suggests to me that they perhaps did contribute plenty of mit-DNA together with autossomes. Language imposing elites behave in exactly the opposite way, with high Y, low mit transmission.

"Chuvash5" people likely had high levels of R1a, "Basque5" people high levels of R1b. R1b is well characterized as originating from Western Anatolia and is present in high levels in Southern Europe. Another argument as to why it must have arrived first, is that Sardinians are 20-30% R1b and have no "Chuvash5" (yet plenty of "Basque5"). So "Chuvash5" likely entered the "Basque5" dominated Basque country, Spain, France, Ireland, the British Isles and the Central European river valleys, and had major impact in autossomes and probably also in mitochondrial haplogroups yet very little Y-chromossome haplogroup impact.

How can this pattern be explained? I have been thinking about a speculative model and now it's taking shape.
Northwestern Europeans (France and British isles in this context) have generally >60% R1b, but in these runs ~50% "Chuvash5". If "Chuvash5" were intruders specializing in cold environment, poor soil, agriculture (with Rye and other innovations/developments), they would find the most fertile soils in the region already inhabited by the "Basque5" people and their wheat agriculture. So in a simple scenario, villages would form in a checkered-board pattern. "Basque5" villages would be already established near rivers and the best soils at very high densities, and thus impossible to remove or substitute. However there may have been plenty of other areas to which primitive wheat agriculture was not congenial, and these would still be inhabited by foragers at much lower densities. Most of these areas would however allow for productive rye agriculture, at much higher densities than foraging. "Chuvash5" migrants could very well have used such niches. They would not be able to settle in the "Basque5" areas, but they would have major advantages versus foragers in their mountain, inter-fluvial and sandy soil regions. So "Chuvash5" villages would maybe become established not far from "Basque5" villages downhill. "Chuvash5" is really about a Mesopotamian subset-derived, cold-adapted Neolithic wave I believe.

In such a scenario, good years would lead to large surpluses in "Basque5" villages, but smaller ones in "Chuvash5 ones. Densities in the former's areas would be quite higher than in the last's too. Lowland elites would form much more easily in "Basque5" villages, but "Chuvash5" ones would remain more egalitarian.

In good years, high levels of surplus would allow men in "Basque5" villages to find other occupations, war, trade for less basic resources for instance. With time social structure would make such arrangements permanent at the expense of peasants. Militarized and commercial elites from rich "Basque5" villages would come to dominate less wealthy "Chuvash5" ones. The result over thousands of years and multiple elite forays from the wheat agriculturally rich areas into rye less productive ones would be exactly what you see: elite Y-DNA spreading while autossomes and mit-DNA remain balanced.

In even colder areas such as Scandinavia and Northeastern Europe, colonization by R1a agriculturalists had much greater advantages. Fewer wheat-adequate environments exist there, vast Rye-congenial ones predominate, and Y-DNA would be more balanced or even the reverse.

So "Chuvash5" is I think really about a new subset of Fertile Crescenters, one adapted to marginal lands and cold-environment agriculture during their passage and evolution in the steppe. The spread of some more admixed people from the Corded Ware region and their explosion into the Eastern expanses of Eurasia suggests to me the final "critical point" may have been reached in this region and time.

Later on, perhaps related pastoralists came again out of the Steppe, not as farmers, but as horse-mounted warriors and conquerors. These maybe were the source of some small Y-DNA contribution. But without major food producing improvements, I very much doubt they had a significative autossomal contribution. They may have had a major linguistic impact, as such horse-mounted militarized elites often have. Or maybe such tongues are a different story entirely.

Tomorrow I'll run a better version of the runs posted above, and will post individual participants results. I decided not to post spreadsheets for this one since people would be reading too much into individual results. Tomorrow's run will be very similar but more solid and valid I think.

Artemis