Preview

Cardiovascular Therapy and Prevention

Advanced search

Population-based biobank for analyzing the frequencies of clinically relevant DNA markers in the Russian population: bioinformatic aspects

https://doi.org/10.15829/1728-8800-2020-2732

Abstract

One of the tasks of population-based biobanks is to determine the frequencies of clinically relevant genetic polymorphisms in the population. The population of Russia is very heterogeneous both ethnically and genetically. Therefore, the frequencies of genetic markers are in demand not in one sample, but in a series of samples reflecting the heterogeneity of the gene pool of different peoples and regions.

Aim. To divide the population of Russia and neighboring countries into population groups meeting certain conditions, as well as having a representative sample in existing data and biobanks.

Material and methods. We developed a method for combining populations into larger groups with maintaining intragroup homogeneity based on the principal components analysis with K-means clustering, followed by refinement of clustering for higher homogeneity and a more equal distribution of group sizes using FST distances. The technology has been adjusted using the example of the Biobank of Northern Eurasia. Therefore, the material was the genome-wide data on 4.5 million genetic markers for 1,883 samples representing 247 populations of Russia and neighboring countries from this biobank. The developed approach, the resulting set of populations and related map can be applied for other collections of biomaterials from Russian populations.

Results. Application of this approach made it possible to divide the entire population of Russia and neighboring countries into 29 ethnogeographic groups, characterized by relative genetic homogeneity. This set of populations is recommended as a baseline for population screenings to identify the frequency of any genetic markers among the population of Russia. A map has been constructed showing the division of population into 29 ethnogeographic areas.

Conclusion. On the basis of a reliable genome-wide data, the zoning of gene pool of the Russian population was carried out. We identified ethnogeographic groups with intergroup contrasting allele frequencies, but at the same time with relatively homogeneous intragroup parameters. The resulting map and register of groups can be used in population genetic, medical genetic and pharmacogenetic studies.

About the Authors

I. O. Gorin
Vavilov Institute of General Genetics
Russian Federation
Moscow


V. S. Petrushenko
Vavilov Institute of General Genetics
Russian Federation
Moscow


Yu. S. Zapisetskaya
Vavilov Institute of General Genetics
Russian Federation
Moscow


S. M. Koshel
N. P. Bochkov Research Center of Medical Genetics; Lomonosov Moscow State University
Russian Federation
Moscow


O. P. Balanovsky
Vavilov Institute of General Genetics; N. P. Bochkov Research Center of Medical Genetics; Lomonosov Moscow State University; Biobank of Northern Eurasia
Russian Federation
Moscow


References

1. Balanovskaya EV, Zhabagin MK, Agdzhoyan AT, et al. Population biobanks: Organizational models and prospects of application in gene geography and personalized medicine. Russian Journal of Genetics. 2016;52(12):1371-87. (In Russ.) doi:10.7868/S001667581612002X.

2. Jing L, Haiyi L, Xiong Y, et al. Genetic architectures of ADME genes in five Eurasian admixed populations and implications for drug safety and efficacy. J Med Genet. 2014;51(9):614-22. doi:10.1136/jmedgenet-2014-102530.

3. Mirzaev KB, Fedorinov DS, Ivashchenko DV, et al. ADME pharmacogenetics: future outlook for Russia. Pharmacogenomics. 2019;20(11):847-65. doi: 10.2217/pgs2019-0013.

4. Triska P, Chekanov N, Stepanov V, et al. Between Lake Baikal and the Baltic Sea: genomic history of the gateway to Europe. BMC Genet. 2017;18(Suppl 1):110. doi:10.1186/s12863-0170578-3.

5. Jeong C, Balanovsky O, Lukianova E, et al. The genetic history of admixture across inner Eurasia. Nat Ecol Evol. 2019;3:966-76. doi:10.1038/s41559-019-0878-2.

6. Balanovsky OP, Gorin IO, Zapisetskaya YS, et al. Interaction of the gene pools of the Russian and Finnish-speaking population of the Tver region: analysis of 4 million SNP markers. Vestnik RSMU. 2020;(6). (In Russ.) doi:10.24075/vrgmu.2020.072.

7. Alhusain L, Hafez AM. Nonparametric approaches for population structure analysis. Hum Genomics. 2018;12(1):25. doi:10.1186/ s40246-018-0156-4.

8. Liu N, Zhao H. A non-parametric approach to population structure inference using multilocus genotypes. Hum Genomics. 2006;2(6):353-64. doi:10.1186/1479-7364-2-6-353.

9. Patterson N, Price AL, Reich D. Population Structure and Eigenanalysis. PLoS Genet. 2006;2(12):e190. doi:10.1371/journal.pgen.0020190.

10. Hartigan JA, Wong MA. Algorithm AS 136: A K-Means Clustering Algorithm. J R Stat Soc. 1979;28:100-8. doi:10.2307/2346830.

11. Lee C, Abdool A, Huang C. PCA-based population structure inference with generic clustering algorithms. BMC Bioinformatics. 2009;10 Suppl 1(Suppl 1):S73. doi:10.1186/1471-2105-10S1-S73.

12. Chang CC, Chow CC, Tellier LC, et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. doi:10.1186/s13742-015-0047-8.

13. Manichaikul A, Mychaleckyj JC, Rich SS, et al. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26:2867-73. doi:10.1093/bioinformatics/btq559.

14. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011;12:2825-30. https://www.researchgate.net/publication/51969319_Scikit-learn_Machine_Learning_in_Python.

15. Koshel SM. Geoinformation technologies in genogeography. Modern geographic cartography. 2012;158-66. (In Russ.) Кошель С. М. Геоинформационные технологии в геногеографии. Современная географическая картография. 2012;158-166. https://www.researchgate.net/publication/294848419_Geoinformacionnye_tehnologii_v_genogeografii.


Supplementary files

Review

For citations:


Gorin I.O., Petrushenko V.S., Zapisetskaya Yu.S., Koshel S.M., Balanovsky O.P. Population-based biobank for analyzing the frequencies of clinically relevant DNA markers in the Russian population: bioinformatic aspects. Cardiovascular Therapy and Prevention. 2020;19(6):2732. (In Russ.) https://doi.org/10.15829/1728-8800-2020-2732

Views: 956


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1728-8800 (Print)
ISSN 2619-0125 (Online)