Bioinformatics approach to processing data from high-throughput sequencing of small RNA molecules
https://doi.org/10.15829/1728-8800-2024-4195
EDN: IRNCAQ
Abstract
High-throughput sequencing of small ribonucleic acid (RNA) molecules is widely used to search for markers of various diseases, as well as to study the regulation of gene expression. The data processing protocol consists of many stages, including the stages of analyzing the initial data quality and sequencing results, mapping and studying the expression profile of the detected small RNA molecules. A whole arsenal of programs and specific packages has already been developed to implement each study step. The instrumental composition of the final bioinformatics protocol is critically important for the correct data processing and study reproduction. This review describes the most universal protocol for processing the results of high-throughput sequencing of small RNA molecules, including all the main stages and the most widely used programs.
About the Authors
A. A. ZharikovaRussian Federation
Moscow
Yu. V. Vyatkin
Russian Federation
Moscow
A. V. Kiseleva
Russian Federation
Moscow
A. N. Meshkov
Russian Federation
Moscow
References
1. Shi J, Zhou T, Chen Q. Exploring the expanding universe of small RNAs. Nat Cell Biol. 2022;24: 415–423. doi:10.1038/s41556-022-00880-5
2. Kopp F, Mendell JT. Functional classification and experimental dissection of long noncoding RNAs. Cell. 2018;172: 393–407. doi:10.1016/j.cell.2018.01.011
3. Frankish A, Carbonell-Sala S, Diekhans M, Jungreis I, Loveland JE, Mudge JM, et al. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 2023;51: D942–D949. doi:10.1093/nar/gkac1071
4. Fazmin IT, Achercouk Z, Edling CE, Said A, Jeevaratnam K. Circulating microRNA as a biomarker for coronary artery disease. Biomolecules. 2020;10: 1354. doi:10.3390/biom10101354
5. Cui M, Wang H, Yao X, Zhang D, Xie Y, Cui R, et al. Circulating MicroRNAs in cancer: Potential and challenge. Front Genet. 2019;10: 626. doi:10.3389/fgene.2019.00626
6. Grasso M, Piscopo P, Confaloni A, Denti MA. Circulating miRNAs as biomarkers for neurodegenerative disorders. Molecules. 2014;19: 6891–6910. doi:10.3390/molecules19056891
7. Zharikova AA, Mironov AA. piRNAs: Biology and Bioinformatics. Mol Biol (Mosk). 2016;50: 80–88. doi:10.7868/S0026898416010225
8. Choudhuri S. Small noncoding RNAs: biogenesis, function, and emerging significance in toxicology. J Biochem Mol Toxicol. 2010;24: 195–216. doi:10.1002/jbt.20325
9. Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993;75: 843–854. doi:10.1016/0092-8674(93)90529-y
10. Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116: 281–297. doi:10.1016/s0092-8674(04)00045-5
11. Hu T, Chitnis N, Monos D, Dinh A. Next-generation sequencing technologies: An overview. Hum Immunol. 2021;82: 801–811. doi:10.1016/j.humimm.2021.02.012
12. Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8: 175–185. doi:10.1101/gr.8.3.175
13. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30: 2114–2120. doi:10.1093/bioinformatics/btu170
14. Miga KH, Newton Y, Jain M, Altemose N, Willard HF, Kent WJ. Centromere reference models for human chromosomes X and Y satellite arrays. Genome Res. 2014;24: 697–707. doi:10.1101/gr.159624.113
15. Aganezov S, Yan SM, Soto DC, Kirsche M, Zarate S, Avdeyev P, et al. A complete reference genome improves analysis of human genetic variation. Science. 2022;376: eabl3533. doi:10.1126/science.abl3533
16. Luu P-L, Ong P-T, Dinh T-P, Clark SJ. Benchmark study comparing liftover tools for genome conversion of epigenome sequencing data. NAR Genom Bioinform. 2020;2: lqaa054. doi:10.1093/nargab/lqaa054
17. Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016;17: 13. doi:10.1186/s13059-016-0881-8
18. Harrison PW, Amode MR, Austine-Orimoloye O, Azov AG, Barba M, Barnes I, et al. Ensembl 2024. Nucleic Acids Res. 2024;52: D891–D899. doi:10.1093/nar/gkad1049
19. Zhang H. Overview of sequence data formats. Methods Mol Biol. 2016;1418: 3–17. doi:10.1007/978-1-4939-3578-9_1
20. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29: 15–21. doi:10.1093/bioinformatics/bts635
21. Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37: 907–915. doi:10.1038/s41587-019-0201-4
22. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10. doi:10.1093/gigascience/giab008
23. Sai Lakshmi S, Agrawal S. piRNABank: a web resource on classified and clustered Piwi-interacting RNAs. Nucleic Acids Res. 2008;36: D173–7. doi:10.1093/nar/gkm696
24. Chen X, Han P, Zhou T, Guo X, Song X, Li Y. circRNADb: A comprehensive database for human circular RNAs with protein-coding annotations. Sci Rep. 2016;6. doi:10.1038/srep34985
25. Friedländer MR, Mackowiak SD, Li N, Chen W, Rajewsky N. miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 2012;40: 37–52. doi:10.1093/nar/gkr688
26. Tav C, Tempel S, Poligny L, Tahi F. miRNAFold: a web server for fast miRNA precursor prediction in genomes. Nucleic Acids Res. 2016;44: W181–4. doi:10.1093/nar/gkw459
27. Vitsios DM, Kentepozidou E, Quintais L, Benito-Gutiérrez E, van Dongen S, Davis MP, et al. Mirnovo: genome-free prediction of microRNAs from small RNA sequencing data and single-cells using decision forests. Nucleic Acids Res. 2017;45: e177. doi:10.1093/nar/gkx836
28. Chatterjee A, Ahn A, Rodger EJ, Stockwell PA, Eccles MR. A guide for designing and analyzing RNA-Seq data. Methods Mol Biol. 2018;1783: 35–80. doi:10.1007/978-1-4939-7834-2_3
29. Hans FP, Moser M, Bode C, Grundmann S. MicroRNA regulation of angiogenesis and arteriogenesis. Trends Cardiovasc Med. 2010;20: 253–262. doi:10.1016/j.tcm.2011.12.001
30. Khan J, Lieberman JA, Lockwood CM. Variability in, variability out: best practice recommendations to standardize pre-analytical variables in the detection of circulating and tissue microRNAs. Clin Chem Lab Med. 2017;55: 608–621. doi:10.1515/cclm-2016-0471
31. Enright A, John B, Gaul U, Tuschl T, Sander C, Marks D. MicroRNA Targets in Drosophila. Genome Biol. 2003;4: P8. doi:10.1186/gb-2003-4-11-p8
Supplementary files
What is already known about the subject?
- One of the important functions of small RNA molecules is the post-transcriptional regulation of gene expression.
- With differential expression, RNA molecules of different classes can be identified, including small RNA molecules, which are potential markers of a wide range of diseases.
- Bioinformatics analysis programs differ in operating speed, usability, availability, a set of additional parameters and other characteristics.
- The development of a bioinformatics protocol for analyzing high-throughput sequencing data contributes to the standardization of preanalytical factors, which is an important step in clinical application, as well as research of small RNA molecules.
What might this study add?
- A universal protocol for processing the results of high-throughput sequencing of small RNA molecules, its main stages and the most widely used programs are described.
Review
For citations:
Zharikova A.A., Vyatkin Yu.V., Kiseleva A.V., Meshkov A.N. Bioinformatics approach to processing data from high-throughput sequencing of small RNA molecules. Cardiovascular Therapy and Prevention. 2024;23(11):4195. (In Russ.) https://doi.org/10.15829/1728-8800-2024-4195. EDN: IRNCAQ