Preview

Cardiovascular Therapy and Prevention

Advanced search

Bioinformatics approach to processing data from high-throughput sequencing of small RNA molecules

https://doi.org/10.15829/1728-8800-2024-4195

EDN: IRNCAQ

Abstract

High-throughput sequencing of small ribonucleic acid (RNA) molecules is widely used to search for markers of various diseases, as well as to study the regulation of gene expression. The data processing protocol consists of many stages, including the stages of analyzing the initial data quality and sequencing results, mapping and studying the expression profile of the detected small RNA molecules. A whole arsenal of programs and specific packages has already been developed to implement each study step. The instrumental composition of the final bioinformatics protocol is critically important for the correct data processing and study reproduction. This review describes the most universal protocol for processing the results of high-throughput sequencing of small RNA molecules, including all the main stages and the most widely used programs.

About the Authors

A. A. Zharikova
National Medical Research Center for Therapy and Preventive Medicine; Lomonosov Moscow State University
Russian Federation

Moscow



Yu. V. Vyatkin
National Medical Research Center for Therapy and Preventive Medicine; Lomonosov Moscow State University
Russian Federation

Moscow



A. V. Kiseleva
National Medical Research Center for Therapy and Preventive Medicine
Russian Federation

Moscow



A. N. Meshkov
National Medical Research Center for Therapy and Preventive Medicine
Russian Federation

Moscow



References

1. Shi J, Zhou T, Chen Q. Exploring the expanding universe of small RNAs. Nat Cell Biol. 2022;24: 415–423. doi:10.1038/s41556-022-00880-5

2. Kopp F, Mendell JT. Functional classification and experimental dissection of long noncoding RNAs. Cell. 2018;172: 393–407. doi:10.1016/j.cell.2018.01.011

3. Frankish A, Carbonell-Sala S, Diekhans M, Jungreis I, Loveland JE, Mudge JM, et al. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 2023;51: D942–D949. doi:10.1093/nar/gkac1071

4. Fazmin IT, Achercouk Z, Edling CE, Said A, Jeevaratnam K. Circulating microRNA as a biomarker for coronary artery disease. Biomolecules. 2020;10: 1354. doi:10.3390/biom10101354

5. Cui M, Wang H, Yao X, Zhang D, Xie Y, Cui R, et al. Circulating MicroRNAs in cancer: Potential and challenge. Front Genet. 2019;10: 626. doi:10.3389/fgene.2019.00626

6. Grasso M, Piscopo P, Confaloni A, Denti MA. Circulating miRNAs as biomarkers for neurodegenerative disorders. Molecules. 2014;19: 6891–6910. doi:10.3390/molecules19056891

7. Zharikova AA, Mironov AA. piRNAs: Biology and Bioinformatics. Mol Biol (Mosk). 2016;50: 80–88. doi:10.7868/S0026898416010225

8. Choudhuri S. Small noncoding RNAs: biogenesis, function, and emerging significance in toxicology. J Biochem Mol Toxicol. 2010;24: 195–216. doi:10.1002/jbt.20325

9. Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993;75: 843–854. doi:10.1016/0092-8674(93)90529-y

10. Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116: 281–297. doi:10.1016/s0092-8674(04)00045-5

11. Hu T, Chitnis N, Monos D, Dinh A. Next-generation sequencing technologies: An overview. Hum Immunol. 2021;82: 801–811. doi:10.1016/j.humimm.2021.02.012

12. Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8: 175–185. doi:10.1101/gr.8.3.175

13. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30: 2114–2120. doi:10.1093/bioinformatics/btu170

14. Miga KH, Newton Y, Jain M, Altemose N, Willard HF, Kent WJ. Centromere reference models for human chromosomes X and Y satellite arrays. Genome Res. 2014;24: 697–707. doi:10.1101/gr.159624.113

15. Aganezov S, Yan SM, Soto DC, Kirsche M, Zarate S, Avdeyev P, et al. A complete reference genome improves analysis of human genetic variation. Science. 2022;376: eabl3533. doi:10.1126/science.abl3533

16. Luu P-L, Ong P-T, Dinh T-P, Clark SJ. Benchmark study comparing liftover tools for genome conversion of epigenome sequencing data. NAR Genom Bioinform. 2020;2: lqaa054. doi:10.1093/nargab/lqaa054

17. Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016;17: 13. doi:10.1186/s13059-016-0881-8

18. Harrison PW, Amode MR, Austine-Orimoloye O, Azov AG, Barba M, Barnes I, et al. Ensembl 2024. Nucleic Acids Res. 2024;52: D891–D899. doi:10.1093/nar/gkad1049

19. Zhang H. Overview of sequence data formats. Methods Mol Biol. 2016;1418: 3–17. doi:10.1007/978-1-4939-3578-9_1

20. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29: 15–21. doi:10.1093/bioinformatics/bts635

21. Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37: 907–915. doi:10.1038/s41587-019-0201-4

22. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10. doi:10.1093/gigascience/giab008

23. Sai Lakshmi S, Agrawal S. piRNABank: a web resource on classified and clustered Piwi-interacting RNAs. Nucleic Acids Res. 2008;36: D173–7. doi:10.1093/nar/gkm696

24. Chen X, Han P, Zhou T, Guo X, Song X, Li Y. circRNADb: A comprehensive database for human circular RNAs with protein-coding annotations. Sci Rep. 2016;6. doi:10.1038/srep34985

25. Friedländer MR, Mackowiak SD, Li N, Chen W, Rajewsky N. miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 2012;40: 37–52. doi:10.1093/nar/gkr688

26. Tav C, Tempel S, Poligny L, Tahi F. miRNAFold: a web server for fast miRNA precursor prediction in genomes. Nucleic Acids Res. 2016;44: W181–4. doi:10.1093/nar/gkw459

27. Vitsios DM, Kentepozidou E, Quintais L, Benito-Gutiérrez E, van Dongen S, Davis MP, et al. Mirnovo: genome-free prediction of microRNAs from small RNA sequencing data and single-cells using decision forests. Nucleic Acids Res. 2017;45: e177. doi:10.1093/nar/gkx836

28. Chatterjee A, Ahn A, Rodger EJ, Stockwell PA, Eccles MR. A guide for designing and analyzing RNA-Seq data. Methods Mol Biol. 2018;1783: 35–80. doi:10.1007/978-1-4939-7834-2_3

29. Hans FP, Moser M, Bode C, Grundmann S. MicroRNA regulation of angiogenesis and arteriogenesis. Trends Cardiovasc Med. 2010;20: 253–262. doi:10.1016/j.tcm.2011.12.001

30. Khan J, Lieberman JA, Lockwood CM. Variability in, variability out: best practice recommendations to standardize pre-analytical variables in the detection of circulating and tissue microRNAs. Clin Chem Lab Med. 2017;55: 608–621. doi:10.1515/cclm-2016-0471

31. Enright A, John B, Gaul U, Tuschl T, Sander C, Marks D. MicroRNA Targets in Drosophila. Genome Biol. 2003;4: P8. doi:10.1186/gb-2003-4-11-p8


Supplementary files

What is already known about the subject?

  • One of the important functions of small RNA molecules is the post-transcriptional regulation of gene expression.
  • With differential expression, RNA molecules of different classes can be identified, including small RNA molecules, which are potential markers of a wide range of diseases.
  • Bioinformatics analysis programs differ in operating speed, usability, availability, a set of additional parameters and other characteristics.
  • The development of a bioinformatics protocol for analyzing high-throughput sequencing data contributes to the standardization of preanalytical factors, which is an important step in clinical application, as well as research of small RNA molecules.

What might this study add?

  • A universal protocol for processing the results of high-throughput sequencing of small RNA molecules, its main stages and the most widely used programs are described.

Review

For citations:


Zharikova A.A., Vyatkin Yu.V., Kiseleva A.V., Meshkov A.N. Bioinformatics approach to processing data from high-throughput sequencing of small RNA molecules. Cardiovascular Therapy and Prevention. 2024;23(11):4195. (In Russ.) https://doi.org/10.15829/1728-8800-2024-4195. EDN: IRNCAQ

Views: 285


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1728-8800 (Print)
ISSN 2619-0125 (Online)