I've been busy with unrelated stuff lately so didn't find time to post last week.
I've decided to take a new look into variation within African populations, using the restricted pole trick I used before in West Eurasian ones. I wanted to take a look at variation within the Niger-Congo or West African Neolithic Core variation in particular, by trying to split it up into 3 components.
I should point out I'm doing these runs seeking support for a theoretical model. It's hard if not impossible to provide definitive evidence (if such a thing exists at all) through ADMIXTURE results. But models which provide unexpected results or predictions which can then be tested by other methods or analysis of future data not available right now, have value in my opinion.
I've run a few unsupervised runs of the African samples available to me, compared with those in other projects, and used a few observations to guide me in pole choice. I was particularly interested in possible language-family-related components, since even though language groups don't always correlate exactly with genetics, there tends to be some relationship.
Restricted poles comprising only a few individuals are a good way of establishing one or more "centres" for a cline in the data. They don't "stick" however if there is no such cline, or if the cline extremes are not represented. For instance, I chose a "Palestinian5" pole using 5 Palestinians, yet it was immediately stolen by the Saudis, who present a similar and otherwise unrepresented component to Palestinians yet at a higher level.
However, for such clines as the Niger-Congo or "West African" cline, from Mandenka/Dogon to the South and East African Bantus, it is useful to establish some such "hand-picked centres" to differentiate subpopulations. The frontier between the 3 Niger-Congo components I've come up with is thus necessarily somewhat artificial. Choosing different "restricted poles" would result in different components to a much larger degree than in previous "Basque5" or "Lithuanian5" poles I've used, since these are much stabler.
These were the poles chosen:
1) Three Mandenka and two Dogon for a Northwestern Niger-Congo component. Dogon often appear as modal for the West-African component in unsupervised runs, indicating possible origin of the West African Neolithic in the Sahel. I named this component NCongo1 (Niger-Congo1)
2) Five Yoruba: NCongo2
3) One Pedi, two Fang, two Kongo and one Luhya to try to identify the Bantu component, which I named NCongo3 (Bantu)
4) Three Bulala and two Masai. Masai often claim their own component and at higher Ks even multiple ones, due to close family links between some of them. I removed some of these individuals, but in order to avoid such artefacts, I used Bulala trying to make the Masai pole more independent from the Masai themselves. This component turned out to be important in all Nilo-Saharan speaking populations (plus the Sandawe, and even these can be split away into their own pole) so I took the liberty of naming it so, which shouldn't be taken too seriously.
5) Five Biaka Pygmies (BPygmy)
6) Five Mbuti Pygmies (MPygmy)
7) Five Hadza
8) Five Namibian San- Khoisan pole
9) Five Palestinians. This pole was taken by the Saudi sample, as mentioned before. I named it FC(NEAfr). It seems to be important in Semitic language speaking populations, but it goes further into other populations as well. It roughly corresponds to the "second wave" or WMPC+NC component used in Fertile Crescent runs.
10) Two Mozabite, two Tunisians and 1 N Moroccan. This pole peaks in the Tunisian sample and was named FC(NWAfr)
11) Five Basques. Named FC(WMPC) - Fertile Crescent Western "Mesopotamian Core"
12) Five Lithuanians, named FC(NMPC)
13) Five individuals from Urkarah, Northern Caucasus. Named FC(EMPC)
Populations are roughly distributed by language group in the graph.
1) Hausa is Afro-Asiatic from the Chadic group
2) Mandinka and Bambaran belong to the Mande group of Niger-Congo languages. Dogon languages may be related early offshoots of proto-Niger-Congo.
3) Brong, Yoruba and Igbo speak NC languages of the Atlantic-Congo group, the group which includes Bantu languages. Their languages are not Bantu though, and each belongs to a different subgroup distinct from the Bantoid one and from each other
4) Bamoun or Bamum speak an Atlantic-Congo language of the Bantoid subgroup but their language though related is not considered part of the more narrow Bantu-proper group.
5) Fang, Kongo, Luhya, Xhosa, Pedi, Nguni, Sotho, Tswana all speak Bantu-proper languages. These are a subgroup of Bantoid, itself a group within Atlantic-Congo, which is a major Niger Congo family.
6) Bulala or Bilala, Masai, Alur, Hema and Kaba all speak languages of the Nilo-Saharan family
7) Mada seem related genetically but speak an Afro-Asiatic language of the Chadic group
-NCongo1, NCongo2 and NCongo3 seem to exhibit clines as expected since these are more or less spurious components dividing a continuous cline of West African-like peoples from the Sahel to South Africa. Judging by fst to neighbouring forager components, they seem to have absorbed a bit of these elements into the components (eg NCongo3 seems to have absorbed a bit of San).
-There seems to be a remarkable concordance between component patterns and language families, including subgroups of such families. The model predicts such languages would expand in association with Neolithic peoples' colonization movements. Some discrepancies may be explained as later elite imposition of intrusive languages.
-Dogon languages seem to be the outgroup in the Niger-Congo languages, and they are the most distant to the Bantu genetically within the Niger-Congo group. They may represent an early offshoot, perhaps isolated since, of the original Neolithic Core/Revolution. In unsupervised runs, Dogon are often the "most West-African" of all these population samples.
-Nilo-Saharan-speaking populations share a common component, indicating spread of most Nilo-Saharan tongues not only by elites but more likely by another food-producing revolution. Bulala are from Chad, Masai live in Kenya. Both are pastoralist peoples.
- Alur seem to have some small but significant Mbuti Pygmy ancestry. These peoples live not far from each other and both speak Nilo-Saharan languages, probably adopted by the Pygmies in the last few thousand years in contact (and marginalization) with Alur-related peoples from the North.
-Similarly there are Biaka Pygmy segments in Bantus, and Biaka Pygmies today speak Bantu languages adopted from their agriculturalist Bantu neighbours.
- Both Chadic-speaking groups (Hausa and Mada) seem to have some "Nilo-Saharan" component. Hausa's Niger-Congo-speaking neighbours mostly lack it. Hausa may be a West-African population subjected to elite language-shift towards Afro-Asiatic after intrusion from AA-speaking herders from the East.
-It is possible the reverse may explain Luhya ethnogenesis, with an intrusion of Bantu agriculturalists into Nilo-Saharan pastoralist occupied land.
-Fertile Crescent ancestry of Masai and Ethiopians is mostly FC(NEAfr), or most like that of Saudis, which seems to be in agreements with linguistics. However these populations also seem to have some Northwest African influence (FC(NWAFr), just as Egypt but unlike in Saudis (remember the pole were actually 5 Palestinians). I think it's possible that at the time of the Semitic speaking migrations from Arabia, possibly some 2000-3000 years ago, there was already an older Egypt-derived Fertile Crescent element in the region, whose languages do not survive.
-Fertile Crescent elements found earlier in the Fulani seem to be wholly derived from North West African populations. Fulani speak a Niger-Congo language and have much affinity to other West African populations as well, so I'd say this admixture event is more likely very ancient.
-I don't know why FC(NMPC) elements appear in North African and Levantine populations here but not in my previous run. But I included very few European and Near Eastern populations in the analysis and these FC poles are very very close in comparison with the African ones, so it's possibly just due to noise and lack of definition due to few individuals with actual FC(NMPC) and WMPC. Also my last NMPC was based in the Chuvash (Lithuanians have much more affinity to Basques), but I didn't want to include Siberian poles in order to keep things simpler. This run is complicated enough as it is and small components may not represent anything much.
-North African populations may have an aboriginal substrate more complicated than I thought earlier, with possibly aboriginal NorthWest African, Green-Saharan refugee African and possibly other elements in addition to the West-Asian (FC) dominant element, so small segments may be representing such hidden elements and not actual admixture here I think. I'm still convinced elements with SubSaharan African affinity mostly represent aboriginal populations and are not the result of the caravan slave trade.
Tomorrow I'll present individual results.