Tuesday 14 June 2011

Panasian-Part IV: Malaysia and Indonesia

Part IV (and last) of the unsupervised Panasian set ADMIXTURE analysis.

Fst distances provided by ADMIXTURE:
About the populations in this part:
The "Malay" population is comprised by Malaysian Malays; "SGMalay" individuals are from Singapore; while MalayIndonesia are equally Malay dialect-speaking, but from South Sumatra, Indonesia (the official languages of Malaysia and Indonesia, Bahasa Malaysia and Bahasa Indonesia, are different registers of a dialect continuum called "Malay" historically used as a lingua franca spanning both countries).
Temuan speak a language from the "Aboriginal Malay" or "Proto-Malay" group. They are a small group nowadays, and still practice Animism and slash-and-burn agriculture.
Sundanese and Javanese are the main Austronesian-speaking inhabitants of Java and represent together almost half of the Indonesian population (20M and 80M respectively), dominating its politics. Sundanese and Javanese languages are distinct from official Malay-derived Bahasa Indonesia which most of them nowadays speak as well (some exclusively).

Mentawai are Animist/Christian slash-and-burn (swidden) farmers planting sago, yams, taro and raising pigs and chickens much as the original Austronesians and as other modern-day isolated Austronesian peoples; supplemented by much hunting and fishing as well. They live in the Mentawai islands just off West Sumatra.
Batak Karo and Batak Toba are also Austronesian speakers from inland Sumatra; they seem to be dry rice-agriculturalists, also cultivating paddies in some regions.

The "Dayak" sample is derived from an (Austronesian-speaking) Indonesian Dayak tribe from East Borneo, and are slash-and-burn agriculturalists. Biduyah are also Dayaks, but differ from the "Dayak" sample in that they are from Sarawak, Malaysia (Northwestern Borneo). They have presumably received more outside influence than the more Eastern Dayaks, and appear to be more involved in the current plantation economy. (The Iban, represented in a previous run with the main set, are also Dayaks from Western Borneo).

Toraja are Austronesian speakers from Sulawesi's mountainous interior. They're farmers who remained Animist until recently (they are mostly Christian nowadays).

Kambera, Manggarai, Lamaholot, Lembata and Alorese live in the Lesser Sundas in Southeastern Indonesia. They speak Austronesian languages, and are mostly Christian today. These groups present visible Melanesian admixture.

Naasioi are Melanesians, speaking a Papuan language. They share Bougainville island in Papua New Guinea with Austronesian-speaking groups, and are slash-and-burn farmers.
Kensiu and Jehai are Malaysian Negrito peoples.

Some candid observations (and shameless speculations):
1) Austronesian tongues seem to correlate with the darkgreen component modal in Taiwanese Aborigines. Earlier there seemed to be a weaker correlation with Tai-Kadai languages.
These linguistic families have important similarities, and it is controversial among linguists weather they are branches of a more ancient proto-language, or if both branches shared geographically close homelands- their present distribution suggests homeland(s) in Southern China, and populations analysed from China, including all Han groups, also have darkgreen elements.
These are however largely lacking in represented Austro-Asiatic speaking populations (though I would be surprised if more Southern Chinese-influenced Vietnamese weren't the exception); Burmese Sino-Tibetan speakers; Uyghur; Okinawans; and are small in Koreans and Japanese mainlanders (perhaps due to an old Han admixture event/secondary wave?)- all these groups have important "red" elements however.
This could be interpreted as suggesting an origin, or at least diffusion of Sino-Tibetan and Altaic with Neolithic Expansions directly from the Northern China/Yellow River region before the ethnogenesis of the Han (before admixture from populations from the Yangtze River valley?).
(At this moment I would tend to interpret the Altaic element, even in Central Asians, as being derived early on from the East Eurasian Neolithic just as I previously interpreted their Western admixture as being derived from the Fertile Crescent).

2) Austronesian-speaking groups closer to continental SE Asian influence tend to have smaller darkgreen and larger darkblue elements- such as Bidayuh (SW Borneo) and the populations from Java.
The darkblue component is important in continental Southeast Asian populations and those from the Indonesian Archipelago with more historical SE Asian connections. It somewhat correlates with Austro-Asiatic languages, perhaps it indicates an early agricultural wave from China into SE Asia, before the Tai-Kadai-Austronesian one? Darkblue is a bit more distant in fst from the other East Asian Neolithic components, perhaps due to having a larger degree of stabilised ancient local variation included in it.
Darkblue in Indonesians decreases with distance to SE Asia. It may have some association in this context with later expansion of populations- possibly associated with wet rice and other agricultural innovations from SE Asia- from more developed and more densely populated Java, a development continuing to this day.
4) Groups further towards the East present an increasing burgundy element modal in Papuan-speaking Melanesian Naasioi. These groups are swidden agriculturalists. Their relative success in resisting the farmer waves versus distantly related Negrito groups from the Philippines and SE Asia may hint at some primitive agriculture in coastal Papuan-speaking peoples prior to the Austronesian expansion.
Melanesians such as Naasioi probably have some Austronesian admixture, which may be invisible here due to lack of reference highland Papuan populations- the burgundy component may be a fusion of mostly "Papuan" with a little Austronesian- a possible reason why it may pull some of the Philippine Negrito variation in the more admixed tribes.
Polynesians also seem to have important Melanesian admixture.

sorry I didn't include East Asian participants in this run but the ~50000 SNPs in this set do not overlap much with the 23andme ones, so including them would reduce the resolution too much.

Many thanks to Zack from Harappa for drawing my attention to this set -he also wrote and posted the conversion code to bed format.

Saturday 11 June 2011

Panasian-Part III: Philippines

Third part of the Panasian run (excluding Yoruba, South Asians).
This post is about populations from the Philippine Archipelago:

About the populations:
Amis and Atayal are Taiwanese aborigines speaking Formosan Austronesian languages. It is generally believed the Austronesian language family derives from Taiwan, having spread into the Philippines, Indonesia and Polynesia as an agriculturally-driven expansion. Taiwanese "Aboriginals" are historically primitive farmers supplementing their diet with some hunting and fishing.
Until recently they were the main inhabitants of Taiwan, having been swamped by Fujianese Han groups only in the last few centuries (Minnan and Hakka). There is some evidence they shared the island with a minority of Taiwanese Negritos- possibly the true Paleolithic forager native inhabitants- who don't exist anymore as a distinct group today.

Filipino populations are farmers, speaking various Austronesian dialects. Tagalog is the dialect on which the National Language of the Philippines, "Filipino" is based. They live in Southern Luzon. Ilocano is spoken by related populations in more Northern regions of Luzon island. Visayan is a third Filipino group of dialects spoken in the Visayas (central group of Philippine islands) as well as in some parts of Mindanao. The sample represented here was from Visayan-speaking colonists from West Mindanao. These groups comprise the majority of the islands population, and are at the core of self-identification with the modern "Filipino" nationality.

Minanubu or Manobo; Iraya are incipient farmers using slash and burn methods in Mindanao and Mindoro respectively. They speak Austronesian languages. These peoples have been suffering pressure from neighbouring more agriculturally advanced and socially complex Filipino migrants (such as Visayans), and are being pushed out of the more fertile soil in their homelands.

Ayta, Agta, Ati and Mamanwa are Philippine Negritos, who are generally hunter-gatherers, at least until very recently. Ayta and Agta are from Luzon; Ati are from the Visayas; Mamanwa from Mindanao.

The diversity of populations in the Philippines seems to fit very well into the Neolithic expansion model I've been exploring. Three groups can be identified, with a degree of continuity between groups: advanced farmer "Filipinos"; slash-and-burn farmers such as Iraya and Minanubu; and forager Negritos.
1) Negritos probably represent the ancient (forager) population of the Philippines. They present varying hybridization with farmers (as seen in farmer-associated components).
The similar pattern of "forager-components" (Kensiu are SE Asian Negritos. Naasioi are Papuans) in Agta, Ati and Ayta may represent Philippine-Negrito unique genetic patterns that ADMIXTURE didn't pick out in this analysis (maybe due to a more inbred Mamanwa sample taking it's own component and non-represented variety in the other Negritos being allocated to related Naasioi and Kensiu-modal components- something similar often happens in Siberian peoples' ADMIXTURE runs).
2) The earliest "First Wave" farmer intrusion in the Philippines is probably associated with Austronesian languages. Taiwanese Aboriginal groups speak the most divergent and diverse Austronesian tongues, and they are modal for the darkgreen component found in much larger amounts in Philippine farmers than in foragers (Negritos). Iraya and Minanubu have large such components but mostly lack the red, blue and lightgreen ones - they may represent a "First Wave" stage of Austronesian Expansion proper.
3) Han Fujianese and Cantonese migration into the Philippines is historically documented even before the Spanish Conquest. A "Second Wave" process may be interpreted as being in full swing in the Philippines in recent centuries and up to today. Slash-and-burn agriculturalists such as Iraya and Minanubu seem to be in the process of assimilation or expulsion to more marginal lands. These more primitive agriculturalists also present larger forager-modal components than "Filipinos", though less than Negritos, just as expected.
4) "Filipino" populations of advanced agriculturalists have more significative red and lightgreen elements, in which I tend to see absorption of much Han admixture in the last few centuries. Interestingly, mestizos de Sangley have been historically prominent as advanced farmers and plantation owners. Filipino ethnogenesis may derive from admixture between ancient slash-and-burn agriculturalists and migrants from China. Some admixture with Europeans (Spaniards, Americans) may also have occurred, to a smaller degree than with the Han, but a much higher degree than in other regional populations.

Wednesday 8 June 2011

Panasian-Part II: SE Asia

This part II is about continental SouthEast Asian populations from the Panasian dataset K=9 analysis I ran earlier. Some useful populations for comparison from part I are represented again- Their ADMXITURE component percentages are exactly the same since this is the very same run.

About the populations:

Hmong live in Southern China, particularly in more mountainous regions often separated by Han-inhabited areas, suggesting they were once more widespread in China, but were possibly swamped by an agriculturally more advanced wave of Han in the lowlands. They also live in South East Asian countries, such as Thailand. Yao people likewise live in Southern China and SouthEast Asia (the Yao sample in this set is from Thailand). Both Hmong and Yao speak Hmong-Mien languages. These peoples in China are often lumped as "Miao" by the Han. Interestingly, Han Chinese foundational myths speak of the Miao as a people originally from the Yellow River, forced to migrate south after conflict with the Huaxia, another Yellow River people, from whom the Han claim to descend. Such stories may be just folklore, but they do resonate somewhat with a Neolithic model of East Asian population expansion. Interestingly, the Hmong-modal component (light green) is larger in populations from the Yellow River southwards, while the closely related red component is more important in Han Chinese and northeastwards from the Yellow River in Koreans and Japanese.

Wa or Va, Lawa, Blang or Plang, Paluang, Mal and Mon speak Austro-Asiatic tongues and live in pockets dispersed between Southern China, Southeast Asia, and Burma. In this set, the Wa sample's from China, the remaining ones from Thailand.
Vietnamese also belongs to the linguistic family. These languages are generally thought of as the original languages of SE Asia, being mostly replaced by Tai-Kadai and Austronesian languages today, but their geographical dispersion might also suggest expansion from a Chinese homeland, where they largely didn't survive.

Tai Yuan, Tai Khuen, Tai Lue, Tai Yong, Zhuang and Jiamao are Tai-Kadai speaking.
Tai-Kadai languages have some similarities to Austronesian tongues, suggesting possible common origin in Southern China- with one expanding West into continental SE Asia, the other East into Taiwan and insular SE Asia. Both would be largely become extinct in Southern China itself after the "Second Wave"-like Han expansion from the North.

Malays and Temuan speak Austronesian languages, as do the aboriginal Taiwanese Amis. SG Malays are from Singapore.

Jehai and Kensiu are Malaysian Negritos. They speak Austronesian languages as well, very likely adopted from their agriculturalist neighbours.

Considerations/speculation:
1) China Hmong appear to differ with Thailand Hmong only in some southern Han admixture in the former. This may also apply to Yao.
2) Temuan speak an Austronesian dialect apparently somewhat mutually intelligible with Malay. They preserve many ancient traditions maybe lost among the Malay, such as religious Animist practices. They also have a larger "pink" element modal in local Negrito foragers. Their presumed greater isolation may help explain less dark green genetic admixture than in Malays.
3) Jehai have substantial "dark blue" element absent in Kensiu. This may suggest an association of the dark blue element with agriculturalists (fst distances to other components are in agreement-see part I).
4) Some "Burgundy" component starts to become visible in Malays, unlike in more Northern populations.
5) The presence of an element clustering with West Eurasians in Mon and Malays is interesting. It's small, so it may be just noise. But since Indian populations weren't included and there's no "South Asian"-modal component, I wouldn't find it strange if this element would have a similar pattern to Fertile Crescent ones as found in South Asia- either from the time of the arrival of Neolithic West Asians to India, perhaps also later?
6) ADMIXTURE patterns at this K and language families have some correlation. A common origin in an ancient Yellow River-Yangtze River Nelithic Core Area can also be argued for linguistically.

Next I'll post results from this run for Insular Southeast Asia (Philippines, then Indonesia).

Monday 6 June 2011

Panasian-Part I: Altaic and Sino-Tibetan

I've been away from ADMIXTURE for a couple of weeks, too busy with other stuff.
This time I've decided to tackle East Asia. I got access to a new dataset from the Pan Asian SNP Consortium, with some 50.000 snps. Sadly few of them appear to overlap with my current dataset, so fusing them together means I'd have to work with just a few thousand. I may try anyway in the future, but decided to play with the Panasian set on it's own for the time being. I didn't use the South Asian or Yoruba samples in this series to simplify things, but I did include White Utahns to check for possible West Eurasian (Fertile Crescent) influence.
I intend to apply some old tricks to get components to be more informative and less "isolated group"-tied, but firstly I wanted to see how this set would behave in an unsupervised ADMIXTURE analysis; namely I intend to check which unsupervised components are interesting and coherent with ethnographic/historical data so I can pick them for supervised analysis and hopefully gain some further insights.
I'm presenting in the next few days a series of regionally-split results.

Unsupervised results are good for inter-population comparisons. Most components likely don't represent any particular ancient populations. A certain amount of small component noise is expected also.
The following results are at K=9.


About the populations:
JapaneseML are from the mainland, as presumably are most "Japanese" without the ML qualification. They were separated in the set and I didn't fuse them. JapaneseRyukyu are from Okinawa.
SGChinese are Chinese from Singapore; BJG from Beijing.
Taiwan Hakka and Taiwan Minnan or Hoklo are Han Chinese, comprising the overwhelming majority of the island's current population. They represent the people generally meant when speaking of "Taiwanese" nowdays. They are however recent arrivals, presumably overwhelming the native aboriginals (Taiwan Ami and Atayal) with more efficient agricultural/social technologies only in the last 500 years or so.

TaiwanAmi and TaiwanAtayal are much older Taiwanese populations, but some discontinuity in the Paleolithic-Neolithic transition in Taiwan may imply an exogenous origin (possibly from early Neolithic China). They speak "proto-Austronesian" languages, and the Austronesian wave of language and agricultural lifestyle seems to have spread from there (or perhaps from Southeast mainland China, with a side branch going to Taiwan?).
The Austronesian Expansion seems to have been sort of a "First Wave" of agriculturalists (maybe secondary in some regions). Much later, advanced agriculturalist "second wave" Han Chinese then had again a major demographic effect, going beyond to other Austronesian lands as well, and apparent even in the Philippines today, with some 20% of the population having recent Chinese ancestry. Without Western interference, this possible Han "secondary wave" might have spread further still, given the large amounts of land then still occupied by foragers in insular South East Asia and Oceania. Indeed Negrito forager and semi-forager tribes are still under pressure today from their agricultural neighbours.

Jinuo and Karen are Burmese populations with Sino-Tibetan tongues (same family as Chinese languages such as Mandarin and Cantonese and also Tibetan. Mon speak an AustroAsiatic tongue but also live in Burma.

Mlabri, Mamanwa and Kensiu are all forager/semi-forager tribes. Mlabri live in South East Asia; Mamanwa are Philippine Negritos, while Kensiu are SouthEast Asian Negritos. Naasioi are Papuans.

Fst distances:
I'm not naming the components since I'm not sure they are historically informative.
It's interesting "Red" is very close to "DarkGreen" and LightGreen". The genetic distance is similar to that between closely related components from other Neolithic centres I've run before.
Some ancient stabilized admixture with very different local forager groups, present in these unsupervised components may even explain some of the distance, so the affinity may be higher than seen here.
On the other hand, "Forager-components" are much less similar both to the presumed Neolithic ones and with other forager components. Actually, since I strongly suspect these modern day foragers are hybridized with their agriculturalist neighbours, the distance between different "Negrito"-modal components may be even larger.
The distance between Westerner White Utahns and these groups seems to be roughly similar to the distance between foragers and agriculturalists, and different groups of foragers, but much larger than distance between agriculturalists. A possible explanation is multiple waves of forager-swamping agriculturalists from a single centre or group of related centres in the region.
Some minor forager admixture in farmers and major farmer admixture in foragers, would both be invisible to unsupervised ADMIXTURE if ancient in the absence of "pure" control groups.

I'll reserve more discussion for the supervised run, right now I'd venture to say the red, light green, dark green and blue components are all closely related, tend to exist in a gradient of admixture with one another in similar ethnic groups and may correspond to different but related Neolithic waves probably all from China. There is some interesting correlation with language groups.
Ryukyu Japanese may lack some components more important in mainland Japanese and Koreans due to greater geographic insulation from Chinese secondary Waves. Perhaps like Sardinians and Basques in the West.
Taiwanese Aborigines look promising as the representatives of a vast farmer demographic wave. Darkgreen presence in China may indicate it's origin there, since agriculture is older and was presumably more advanced in the continent.

In the next days I will post results for other Austronesian and Southeast Asian peoples. Then I'll do a restricted pole-supervised run.