Conventions used in this book

This text is not a simple essay anymore. Even though I conceived it initially as a mere fourth revision at the end of 2017, it grew rapidly out of hand, as I intended to include as much relevant information as possible on published (and reported) population movements supported by genetic investigation. The association of genetic data with potential prehistoric ethnolinguistic communities required in turn the addition of all potentially relevant archaeological data.

The first two volumes of this series must be understood as a detailed supplement of the main work, which is the third volume concerning linguistic data. This order of relevance is not only related to this series’ emphasis on languages over prehistoric cultures or genetics, but to the actual nature of the matter at hand: this a comprehensive work on reconstructed languages and the peoples who might have spoken them.

The work follows simple rules in its aim to achieve clarity and coherence.

It is an encyclopaedia-like text, free and organised in more or less isolated linguistic, archaeological, and genetic sections organised to facilitate future revisions by anyone, incorporating the latest research.

Unlike armchair work in linguistics or bioinformatics, where results and interpretations can be reviewed with knowledge and proper access to data, it is impossible to be an armchair archaeologist without ample experience in the specific field investigated. Therefore, secondary archaeological sources, giving proper interpretation and synthesis of primary research and fieldwork, are preferred over primary sources in the archaeological section, with little or no personal additions in this part, although primary sources and proper connections of the data have also been added whenever necessary. All archaeological summaries included are properly referenced, with the main author or authors behind the content of each paragraph properly citedat least the author of the secondary source, often more relevant than primary sourcesto allow for proper identification of the original text and for further reading.

A chronologically and regionally organised structure has been given to the full text, to allow for an easy searching of the content, and for the reading of the text in either a linear or non-linear manner.

Names of samples, their cultures or groupings, ancestries, or clusters do not necessarily follow the nomenclature systems used by the different authors, papers, research labs, or archaeological teams, but are made to fit into the coherent picture of this book (Eisenmann et al. 2018).

Haplogroup (hg.) will be frequently used to refer to Y-chromosome haplogroups, unless otherwise expressly stated. Y-DNA haplogroups and subclades will also be referred to as line or lineage, whereas common admixture components defined in recent papers will be referred to as ancestry. The preferred nomenclature system of haplogroups is X-Y, where X is the standard name by ISOGG (2018), and Y is one or more SNP mutations defining the haplogroup, using whenever possible the one preferred by YFull. An asterisk X-Y* is used to represent a basal lineage, commonly understood as a subclade with different mutations from the most common, ‘successful’ ones.

Additional positive Y+ reported online in non-peer reviewed publications are represented in this text as X-Y+. The originally published haplogroup for the samples, other reported positive and negative SNPs, as well as the author or authors of the additional information, can be found in online supplementary materials of this book.

For the sake of consistency, only YFull estimates for year formed and time to most recent common ancestor (TMRCA) of Y-chromosome haplogroups have been used[1], unless other sources are expressly stated. Years before present (ybp) have been approximated to BC assuming 2,000 years of difference, to round out estimates. Estimates were obtained by Vladimir Tagankin by applying the method published in Adamov et al. (2015) to the data received from voluntary users[2]. Also for the sake of consistency, dates expressed as years before present (YBP) have been simplistically approximated to BC.

TMRCA dates are used as gross approximations to expansions of Y-DNA lineages (see Error! Reference source not found.). They can offer an inaccurate idea of the lineage evolution because a) the actual rate of mutation is unknown, and b) TMRCA estimates are based on the lineages that survived, which may obviate other previous expansions in the same trunk.

Figure 1. Simplistic example of SNP mutations in a haplogroup. Lines represent diverging male lineages. Haplogroup 1 is only successful after the third mutation, and is thus defined by mutations M1, M2, and M3, with M1 representing its formation date, and M3 its TMRCA. Haplogroup 2 is successful during its formation, and thus M1 defines it, as well as its formation and TMRCA date.

Modern physical maps are used to illustrate potential expansion routes of ancient cultures, peoples, and languages, even though they pose a significant danger to the development of a sound model, since they almost invariably involve “a concatenation of weakly supported links that corporately form an ‘arrow’ of dispersion” (Mallory 2014). Map routes are only depicted as a visual help to add movement to the otherwise stationary maps of ancient cultures, peoples, languages, and ancient DNA obtained from scattered burials. Eurasian biomes (Error! Reference source not found. and Suppl. Fig. 19) are commonly referenced to in this book to delimit cultural groups and migration routes.

Figure 2. Simplified map of the distribution of steppes and forest-steppes (Pontic and Pannonian) and xeric grasslands in Eastern Central Europe (with adjoining East European ranges). Modified from Kajtoch et al. (2016).