Monday, 6 June 2011

Panasian-Part I: Altaic and Sino-Tibetan

I've been away from ADMIXTURE for a couple of weeks, too busy with other stuff.
This time I've decided to tackle East Asia. I got access to a new dataset from the Pan Asian SNP Consortium, with some 50.000 snps. Sadly few of them appear to overlap with my current dataset, so fusing them together means I'd have to work with just a few thousand. I may try anyway in the future, but decided to play with the Panasian set on it's own for the time being. I didn't use the South Asian or Yoruba samples in this series to simplify things, but I did include White Utahns to check for possible West Eurasian (Fertile Crescent) influence.
I intend to apply some old tricks to get components to be more informative and less "isolated group"-tied, but firstly I wanted to see how this set would behave in an unsupervised ADMIXTURE analysis; namely I intend to check which unsupervised components are interesting and coherent with ethnographic/historical data so I can pick them for supervised analysis and hopefully gain some further insights.
I'm presenting in the next few days a series of regionally-split results.

Unsupervised results are good for inter-population comparisons. Most components likely don't represent any particular ancient populations. A certain amount of small component noise is expected also.
The following results are at K=9.

About the populations:
JapaneseML are from the mainland, as presumably are most "Japanese" without the ML qualification. They were separated in the set and I didn't fuse them. JapaneseRyukyu are from Okinawa.
SGChinese are Chinese from Singapore; BJG from Beijing.
Taiwan Hakka and Taiwan Minnan or Hoklo are Han Chinese, comprising the overwhelming majority of the island's current population. They represent the people generally meant when speaking of "Taiwanese" nowdays. They are however recent arrivals, presumably overwhelming the native aboriginals (Taiwan Ami and Atayal) with more efficient agricultural/social technologies only in the last 500 years or so.

TaiwanAmi and TaiwanAtayal are much older Taiwanese populations, but some discontinuity in the Paleolithic-Neolithic transition in Taiwan may imply an exogenous origin (possibly from early Neolithic China). They speak "proto-Austronesian" languages, and the Austronesian wave of language and agricultural lifestyle seems to have spread from there (or perhaps from Southeast mainland China, with a side branch going to Taiwan?).
The Austronesian Expansion seems to have been sort of a "First Wave" of agriculturalists (maybe secondary in some regions). Much later, advanced agriculturalist "second wave" Han Chinese then had again a major demographic effect, going beyond to other Austronesian lands as well, and apparent even in the Philippines today, with some 20% of the population having recent Chinese ancestry. Without Western interference, this possible Han "secondary wave" might have spread further still, given the large amounts of land then still occupied by foragers in insular South East Asia and Oceania. Indeed Negrito forager and semi-forager tribes are still under pressure today from their agricultural neighbours.

Jinuo and Karen are Burmese populations with Sino-Tibetan tongues (same family as Chinese languages such as Mandarin and Cantonese and also Tibetan. Mon speak an AustroAsiatic tongue but also live in Burma.

Mlabri, Mamanwa and Kensiu are all forager/semi-forager tribes. Mlabri live in South East Asia; Mamanwa are Philippine Negritos, while Kensiu are SouthEast Asian Negritos. Naasioi are Papuans.

Fst distances:
I'm not naming the components since I'm not sure they are historically informative.
It's interesting "Red" is very close to "DarkGreen" and LightGreen". The genetic distance is similar to that between closely related components from other Neolithic centres I've run before.
Some ancient stabilized admixture with very different local forager groups, present in these unsupervised components may even explain some of the distance, so the affinity may be higher than seen here.
On the other hand, "Forager-components" are much less similar both to the presumed Neolithic ones and with other forager components. Actually, since I strongly suspect these modern day foragers are hybridized with their agriculturalist neighbours, the distance between different "Negrito"-modal components may be even larger.
The distance between Westerner White Utahns and these groups seems to be roughly similar to the distance between foragers and agriculturalists, and different groups of foragers, but much larger than distance between agriculturalists. A possible explanation is multiple waves of forager-swamping agriculturalists from a single centre or group of related centres in the region.
Some minor forager admixture in farmers and major farmer admixture in foragers, would both be invisible to unsupervised ADMIXTURE if ancient in the absence of "pure" control groups.

I'll reserve more discussion for the supervised run, right now I'd venture to say the red, light green, dark green and blue components are all closely related, tend to exist in a gradient of admixture with one another in similar ethnic groups and may correspond to different but related Neolithic waves probably all from China. There is some interesting correlation with language groups.
Ryukyu Japanese may lack some components more important in mainland Japanese and Koreans due to greater geographic insulation from Chinese secondary Waves. Perhaps like Sardinians and Basques in the West.
Taiwanese Aborigines look promising as the representatives of a vast farmer demographic wave. Darkgreen presence in China may indicate it's origin there, since agriculture is older and was presumably more advanced in the continent.

In the next days I will post results for other Austronesian and Southeast Asian peoples. Then I'll do a restricted pole-supervised run.


  1. After plotting the different "colours" as an MDS, I get the following impressions:

    -light green seems to close to an unadulterated neolithic core. Most likely somewhere between the Yangtze.
    -dark green is a mix of mostly light green neolithic and some purple foragers.
    -dark blue seems to be a mix between a bit of light blue forager and an already mixed neolithic populatins of light green neolithic core and pink foragers (probably similar to contemporary Jinuo and Karen).
    -dark blue seems to be a mix between a very archaic forager population more related to both Burgundy and Yellow (Joumon like people?) and an already mixed neolithic populatins of light green neolithic core and pink foragers (probably similar to contemporary Jinuo and Karen).

    It would be interesting if we could see a split in red/blue between a Jinuo like population and their associated foragers (burgundy/pink), to confirm whether this seemingadmixture just an impression or not.
    Also, I think if we had some Siberian people like the Yakut in the run, we might have been able to see a split within the current red component, between a component with greater melanesian affinities and one with greater Siberian affinities (Altaic?).

    To propose names for components:
    light green : Yangtze core
    dark green : proto-austronesian or southeastern Yangtze expension
    dark blue : Mekong core or southern Yangtze expension
    red : northeastern Yangtze extension?
    light blue : SEAsian forager 1 (golden triangle centered)
    pink : proto-Mon-Khmer or SEAsian forager 2
    purple : SEAsian forager 3 (Southeastern China and Philippines centered)
    burgundy : Melanesian forager (possibly originally spread north of SEAsian forager 2 in continental Asia too, with greater paleo-caucasian affinities there?)

  2. I ran this unsupervised analysis to get hints for a future supervised run using "restricted poles" which I think give more informative components than supervised ones.
    I will name those supervised components, since I'll make significant set-up choices about them; but I don't want to name these unsupervised ones.

    With supervised runs I can create models through reasoned set-ups and then check for coherence and predictions. You make some interesting points. I hope the supervised run will give some clues.

  3. Where is HanMandarin from, geographically? The prominence of the Ph-Mamanwa- and WhiteUtahns-modal components here as compared to the other Chinese samples is interesting.

  4. Those are small elements, may be just noise...
    The sample is from the PR of China, comprising more southern Mandarin-speaking individuals than the ChineseBJG sample. In the panasian site map it's pinpointed to Shanghai.