Introduction

This project began as a short essay called “Indo-European Demic Diffusion Model”, published in April 2017 in the Department of Anatomy, Cell Biology, and Zoology of the University of Extremadura, in which I contended that recent genetic investigation suggested that the expansion of Indo-European languages from the steppe was linked to the expansion of R1b1a1b-M269 lineages in Eurasia. In particular, genetic data recovered from ancient individuals seemed to support that the expansion of R1b1a1b1-L23 lineages in Europe was associated with Yamna migrants, and thus also subsequently with the expansion of East Bell Beakers as North-West Indo-European-speakers in Europe, whereas the spread of the Corded Ware culture likely represented the expansion of Uralic speakers.

Some researchers had already expressed doubts on the traditional association of Corded Ware with the Indo-European expansion, although none of them had given an alternative model consistent with the current data, explaining the role of R1a1a1-M417 lineages spreading with Uralic speakers (Horváth 2014), or the recently described “Yamnaya ancestry” peaking among Uralic individuals (Heggarty 2015; Klejn et al. 2017) as the result of mixed Indo-EuropeanUralic communities.

The theory laid in this text takes dialectal evolution as its stable framework, as the core which should underlie any Indo-European expansion model, and uses genetic investigation (of ancient and modern DNA samples) and its potential relationship with archaeological cultures to establish an expansion model step by step. It also takes into account that there are complex problems found in correlations of languages with archaeological cultures (Meier-Brügger 2003) and human genetics (Campbell 2015).

Even though phylogenetic methods became popular in the early 2000s, and have been used intermittently since then, especially by non-linguists (Ringe, Warnow, and Taylor 2002; Anthony and Ringe 2015), it seems more reasonable to avoid such methods in scientific publications, due to their controversial pseudoscientific nature and questionable results (Pereltsvaig and Lewis 2015)[3]. Historical linguistics can only provide a relative historical framework for individual proto-languages and their relationships, though.

Archaeology works with the concept of culture, and as such it is able to determine timelines. When these timelines complement relative chronologies and wide guesstimates of proto-languages beautifully, both are able to provide a contextualised historical explanation of linguistic frameworks (Vander Linden 2015; Hänsel and Zimmer 1994). The model of Indo-European migrations set forth by Marija Gimbutas (Gimbutas 1963, 1977) has been impressively corrected and expanded recently by Volker Heyd (Heyd 2004; Harrison and Heyd 2007; Heyd 2007; Heyd 2011), Valentin A. Dergachev (2007), David W. Anthony (Anthony 2007; Anthony and Brown 2011; Anthony 2013), James P. Mallory (2013), or Christopher Prescott (Prescott and Walderhaug 1995; Prescott 2012), among others. Similarly, the models of early Uralic–Indo-European contacts (Koivulehto 1991; Koivulehto 2003) and Uralic migrations from eastern Europe recently advocated by Kallio (Kallio 2002, 2014) or (Parpola 2013)  have paved the way for a clearer understanding of the cultures and peoples involved in steppe-related migrations, and constitute the basic starting point of this book.

Language and culture expansion have been usually explained by two main alternative models: the demic diffusion model, which involves mass movement of people; and the cultural diffusion model, which refers to cultural impact between populations, and involves limited genetic exchange between them. Language transfer since ancient times seems to be associated with an expansion of people (Mikhailova 2015).

Ancestry of any selected population is likely to be a mixture of several ancient groups, which is reflected on the genetic structure (Haak et al. 2010; Skoglund et al. 2012; Malmström et al. 2009; Lazaridis et al. 2014). However, the genetic landscape for ancient populations is limited by the number of ancient DNA samples and ancient populations studied (Hellenthal et al. 2014). Population expansions are often accompanied by a significant replacement of patrilineal lineages (see Figure 3), due to the formation of expanding kin groups and to the intergroup competition among them (Zeng, Aw, and Feldman 2018). This reduction in variability and Y-chromosome haplogroups is exacerbated by migration and exogamy practices, and also by violent conflicts involving mainly males.

Male-biased population expansions cannot explain all language expansions and replacements, though, and exceptions are frequent throughout history. Archaeological research combined with population genomics is showing how bilingualism and multilingualism was common among different prehistoric groups which interacted closely and often became symbiotically integrated with each other. This is usually associated with chiefdom-like systems and long-lasting exogamy practices, such as those found between Abashevo and Sintashta-Potapovka-Filatovka peoples, between Bell Beakers and El Argar population, or among cultures of the Baltic Sea and expanding Akozino warrior-traders. Beyond that, population bottlenecks including founder effects often obscure the original expansion of a language with male-driven migrations, as happened in Finns with N1a1a1a1a-L392 subclades, and Basques with R1b1a1b1a1a2-P312 lineages.

Figure 3. Simplified model of evolution of an ethnolinguistic community in terms of Y-chromosome haplogroups and admixture: Community A1 invades the territory of Community B, eventually replacing most ‘indigenous’ haplogroups at the same time as certain paternal lineages of A1 have more reproductive success, developing Community A2 through exogamy (evidenced by 30% of admixture from Community B). A language change is very likely to have taken place in the territory of Community B. No language change is necessary, however: (1) in the case of population resurgence (such as the change in haplogroups and admixture from Community A2 to Community A3); (2) after the incorporation of Y-DNA haplogroups and admixture from neighbouring populations; or (3) through founder effects (represented in the evolution from Community A3 to Community A4). Graphic inspired by images in Zeng, Aw, and Feldman (2018).

Ancient DNA (aDNA) investigation allows us to disentangle complex human history (Slatkin and Racimo 2016). The most recent breakthrough in Indo-European migrations obtained thanks to population genomics, concerned with general population movements of Eurasians westwards from the steppe (Haak et al. 2015; Allentoft et al. 2015; Mathieson et al. 2015), suggested that a common so-called “Yamnaya ancestry” represented by individuals of the Yamna culture could also be found in Corded Ware, Bell Beaker, and Únětice in descending proportions, coherent with their radiocarbon dates, apparently connecting them to succeeding migrations. However, the strong reliance on ancestry to derive conclusions on potential population movements, suited for gross interpretations of Palaeolithic and Mesolithic movements over thousands of years based on few assessed samples, has proven much trickier when deriving models of ethnolinguistic change from movements of neighbouring and closely interacting populations lasting just hundreds of years. Proper assessment and interpretation of Y-DNA, which does not change with generations of admixture, has been demonstrated to be key when investigating the connection of certain groups, as is clear from the Iberian (Olalde et al. 2019) and South Asian cases (Narasimhan et al. 2018).

The massive migration of people of Yamna origin with the Bell Beaker culture some four hundred years after the expansion of Corded Ware peoples (a group probably originating close to the north Pontic area), witnesses thus the latest language shift before the start of the Early European Bronze Age. All data put in common, there is little space (if any at all) to relate the expansion of Corded Ware peoples to any Indo-European dialect surviving into historical times, because the partial genetic link of Corded Ware peoples with Yamna is probably earlier than the expansion of Proto-Anatolian, and there is a strong connection of surviving Corded Ware groups with Finno-Ugric populations. Even the 2015 papers on Indo-European migrations showed with their published data that haplogroups R1b1a1b-M269 and R1a1a1-M417 were absent from central and western Europe until after the expansion of Eurasian pastoralists. This data help thus trace most modern European languages to the Eneolithic Pontic–Caspian steppes, and therefore to a massive expansion starting at nearly the same time from eastern Europe after around 3000 BC. In these studies, R1a1a1-M417 was prevalent in Corded Ware samples and was absent from samples of the Yamna horizon, most of which belonged to haplogroup R1b-M269. Further publications on early European (Mathieson et al. 2018), Bell Beaker (Olalde et al. 2018), Turan/South Asian (Narasimhan et al. 2018), and ancient Eurasian samples (de Barros Damgaard, Marchi, et al. 2018; de Barros Damgaard, Martiniano, et al. 2018) further confirm a surprisingly long-lasting clear-cut division of patrilineal lineages among Eneolithic steppe communities, and recent studies on the prehistoric Caucasus (Lazaridis et al. 2018; Wang et al. 2019) are helping reconstruct the different fine-scale population structure of Corded Ware and Yamna peoples.

The recent genetic revolution is helping thus support the mainstream view of a natural evolution of reconstructed languages, including their dialectal stages, with concrete prehistorical communities defined in time and space (Lehmann 1992). Population genomics has therefore cleaned up the comparatist’s desk, dismissing almost all models of cultural diffusion proposed to date, especially flexible frameworks such as the “constellation analogy” (Clackson 2007, 2013) of loosely interconnected prehistoric communities, sharing language and culture through unending waves of areal contact among dialect continua. These were probably the result of fashionable linguistic trends akin to the ‘pots, not people’ paradigm prevalent in archaeology since the mid–20th century, and most of them need be rejected, even though there is still an ongoing controversy over many details of the potential expansion of peoples with certain cultures (Kristiansen et al. 2017; Heyd 2017; Sørensen 2017; Furholt 2017), and there is growing concern about the need for fine-scale studies (Lazaridis 2018; Veeramah 2018).

Even more interesting than the general genetic revolution is the more specific one regarding Indo-European and Uralic migrations. Recently published data is helping reject previously popular theories concerning the dialectal and cultural evolution of Late Proto-Indo-European, such as the Anatolian homeland (Renfrew 1987), or the prevalent identification of Corded Ware with Indo-Europeans (Gimbutas 1977; Kristiansen 1989; Anthony 2007).  In this sense, David Reich’s words regarding the dismissal of the Anatolian homeland theory by genetic data have proven premonitory for the dismissal of their preferred model of Corded Ware as expanding Indo-European dialects:

 “A great lesson of the ancient DNA revolution is that its findings almost always provide accounts of human migrations that are very different from preexisting models, showing how little we really knew about human migrations and population formation prior to the invention of this new technology” (Reich 2018).

Ancient DNA is helping locate different peoples in a very specific place, time, and route of expansion, supporting in turn the most appropriate models of dialectal splits, which closes the circle of interrelated connections between linguistics, archaeology, and genetics, and turns anthropological investigation into a shrinking helix that points more and more precisely to the true ancient ethnolinguistic picture.

While the picture is clearer today than it was just a year ago, the most recent genetic research is also correcting, not just old technology and ancient anthropological interpretations of the 1990s or 2000s, but also genetic methods and results of just months or years ago. Some data that we believed could be breakthroughs in the field have been demonstrated with time to be most likely wrong, either in the radiocarbon dating (due to mixed archaeological layers) or due to errors in technique or recent improvements in technology.

So, for example, the finding of haplogroup N1a-F1206 in the Comb Ware culture (Chekunova et al. 2014) becomes more and more unlikely with each new paper, like the finding of R1a1a-M198 around Lake Baikal during the Early Neolithic (Mooder et al. 2005; Moussa et al. 2016). Similarly, reports based on modern populations, such as the estimated origin R1b1a1b-M269 in Neolithic Europe (Myres et al. 2011), or R1a-L146 in South Asia (Underhill et al. 2015), and many others have been proven repeatedly wrong with ancient DNA. Even today, errors in cultural attribution, radiocarbon dates, and estimated haplogroups or subclades are bound to happen—in addition to technical errors involving the processing and assessment of samples (very difficult to test without the resampling of specimens) —as we have seen most recently in samples from Hajji Firuz in Narasimhan et al. (2018), a huge investigation including scattered Asian samples and necessitating an international collaboration of many different archaeological teams.

More than Kosinna’s smile (Heyd 2017) of equating prehistoric culture to population in a general sense, the most recent genetic investigation should probably represent the joy of Starostin’s Nostratic Eurasian Epipalaeolithic, Kortlandt’s Eurasiatic northern Eurasian Mesolithic, Vennemann’s and Villar’s Vasconic Mediterranean and western (and possibly central) European Neolithic, Wiik’s Uralic northern and eastern European Chalcolithic, and Krahe’s Old European Early Bronze Age. Beyond petty sociopolitical and ethnolinguistic grievances of neighbouring Eurafrasian populations, and beyond the infinite pet theories on the potential ancestral population or language homelands, genetics is cutting up to the chase and dismissing all theories but a few (usually related ones), or even just one, despite the obstinate defence of traditional theories by many academics.

Sadly, the field is plagued with unending setbacks: on one hand, the eternal search for academic authority, and the need to publish and to collect as much citations and publications in journals of high impact factor as possible, are provoking all kinds of reactionary views, to fit previous models with the clear-cut picture emerging in some cases from genetic investigation. On the other hand, modern political and ethnolinguistic views burden this field, ranging from modern Indian politics in favour of an “indigenous Sanskrit” opposed to the so-called “Aryan invasion theory”; through the interest of modern Russian politics in supporting “indigenous Slavs”, opposed to the known history of colonisation and Slavicisation of essentially all of their modern territory; to the interest of certain western European groups in supporting an “indigenous” Palaeolithic Vasconic-speaking population. Reactionary views and ‘nativist’ ethnolinguistic trends are slowly eroding this new anthropological subfield of population genomics, and I would not be surprised if some education systems would reject it as a useful anthropological discipline, for one or other reason.

No one is free of personal or professional bias, and mine is clear: at Academia Prisca, Fernando López-Menchero and I have invested years supporting the reconstruction of North-West Indo-European as a Late Proto-Indo-European dialect, which puts a clear red line in this series of books to any interpretation of the data that challenges this dialectal scheme. Also, I am of haplogroup R1b1a1b1a1a2a-DF27, like many in south-western Europe. On the other hand, we have been publishing texts about Proto-Indo-European since 2005, and I knew my haplogroup since 2008, but until 2015 I supported the spread of North-West Indo-European with Corded Ware and a later Old European dialect continuum centred on the pan-European Únětice culture (Quiles 2012). These cultures were thought to be dominated by R1a-M420 lineages, so that R1b-M343 lineages (probably Vasconic speakers) would have acquired the language by way of cultural diffusion in western Europe, maybe by Bell Beakers along the Rhine.

Only after 2015when, paradoxically, genetic papers seemed to support my preferred modeldid I realise that Corded Ware may not have been linked to the expansion of Proto-Indo-European, and Volker Heyd’s theories seemed to take the lead, with R1a-M420 lineages potentially expanding Indo-Uralic through North Eurasia, but not Indo-European from the steppe, which would have been hitchhiked by R1b-M343 lineages which expanded Afroasiatic from Anatolia into south-eastern Europe (Quiles 2017).

After the most recent papers of 2017 and 2018, it seems more and more unlikely that the early arrival into eastern Europe and lack of expansions of R1a-M420 lineages could be associated with the spread of Indo-Uralic or Eurasiatic through North Eurasia, and therefore R1b-M343 lineages, with a likely origin in (and multiple expansions from) eastern Europe, seem like the most appropriate lines to follow most of the time for the spread of Pre-Indo-European languages.

These are simplistic assessments, and it should be obvious to anyone involved in the field that 1) the current picture shown by available ancient DNA research is clearly shifted towards Europe and R1b-M343 samples, for different reasons, which may be distorting our view of ancient population movements; 2) uniparental markers cannot be linked in a simplistic way to assess ethnolinguistic communities and their movements, because other relevant linguistic, archaeological, and genetic data must be assessed in order to obtain proper migration models; and 3) stages before Indo-Uralic are at best speculative, and are used only to give a coherent account of migrations coupled with reconstructed languages.

Even if all potential biases seem to be under control, a word of caution is due: This book tries to reflect the state of the art of linguistics and archaeology coupled with the available information of population genomics as of the day of its publishing. There is little in science that can be called definitive, and ethnolinguistic identification of prehistoric cultures is not even close to those discoveries and conventions that we could consider firmly established. I have no intention to invest myself into the defence of lost causes, so I would not mind changing any of my interpretations as new data is published: e.g. to argue that Ancient North Eurasian ancestry and Q1a2-M25 represent the Eurasiatic expansion; or to argue that R1a-M420 and Ancient North Eurasian or East Asian ancestry in eastern Europe connects all the necessary dots for the Indo-Uralic expansion, if the new data supports this.

David W. Anthony is a great example of an academic who has invested a lot of time and effort supporting an idea, and has nevertheless changed it as necessary: from a non-Indo-European Corded Ware culture unrelated to Indo-European-speaking Yamna, with certain neighbouring groups adopting the language through “patron–client relationships” (Anthony 2007; Anthony and Ringe 2015); to a Corded Ware culture that expanded with Yamna peoples from the steppe, based on the (then) recently described “Yamnaya ancestry” of genetic papers (Anthony and Brown 2017); to a Corded Ware culture that expanded from Yamna peoples in Hungary, at roughly the same time as it evolved into Bell Beaker, based on the R1a/R1b Y-chromosome bottleneck (Anthony 2017); this last one probably in need of a thorough revision today, as new data has appeared clearly contradicting it. Against this example of a dynamic researcher, there are dozens of known academics unwilling to change one iota of their previous theories, trying to adapt genetic data to their own models. I don’t have much doubts about my intentions or interpretations today, but I do hope that I will be able to change what needs to be changed in the future, like Anthony; but also, to distinguish what is wrong from what is not and needs to be defended in spite of what is fashionable, comfortable, or politically correct.