LTR Predictor: HERV-oriented Alignment Tool

The top-5 mini-projects of Biomedical Informatics 3 (BMI3) coursework. Link here: https://labw.org/bmi3-2021/miniprojects.

Introduction

The human genome is rich in retroviruses and retroviral elements integrated during the evolutionary process (Garcia-Montojo et al., 2018). Among human endogenous retroviruses (HERVs), HERV-K is the most transcriptionally active group (Garcia-Montojo et al., 2018). It plays a vital role in embryogenesis, whereas closely related to cancer and neurodegenerative diseases (Grow et al., 2015, Li et al., 2015, Argaw-Denboba et al., 2017). HERVs are repeats with low complexity, making it hard to perform annotation and analysis geared to the genome (Li et al., 2008). Algorithms like cross_match and WindowMasker used to find repeats masked a large quantity of the annotated exons (Li et al., 2008). Here, we have developed a basic local alignment search tool (BLAST)-like algorithm named the LTR predictor, which can perform fast alignment of repeats while supporting custom input of reference sequences. Our test on LTR5_Hs (a type of HERV-K) shows that the LTR predictor has good accuracy and running speed, and can provide inspiration for predicting the genome coordinates of repeat such as HERV.
The HERV consensus sequences (DF0000471, DF0000472, and DF0000558) were downloaded from Dfam database (https://dfam.org/home). Soft-masked reference sequences of human genome (GRCh38 Genome Reference Consortium Human Reference 38, or hg38) were from UCSC Genome browser (http://genome.ucsc.edu/).

Source code

https://github.com/Haoninghui/BMI3_Project1

Acknowledgements

This project was a group work of Biomedical Informatics 3 course with Hao Ninghui. Architecture and acceleration were aided and advised by Yu Zhejian. Many thanks.
The program tests used the analysis server of ZJE Institute. Thanks for the guidance and feedback from Dr. Wanlu Liu, Dr. Hugo Samano-Sanchez, and teaching assistant Ziwei Xue.

References

ARGAW-DENBOBA, A., BALESTRIERI, E., SERAFINO, A., CIPRIANI, C., BUCCI, I., SORRENTINO, R., SCIAMANNA, I., GAMBACURTA, A., SINIBALDI-VALLEBONA, P. AND MATTEUCCI, C. (2017) HERV-K activation is strictly required to sustain CD133+ melanoma cells with stemness features., Journal of experimental & clinical cancer research : CR, 36(1), p. 20. doi: 10.1186/s13046-016-0485-x.
GARCIA-MONTOJO, M., DOUCET-O’HARE, T., HENDERSON, L. AND NATH, A. (2018) Human endogenous retrovirus-K (HML-2): a comprehensive review., Critical reviews in microbiology, 44(6), pp. 715–738. doi: 10.1080/1040841X.2018.1501345.
GROW, E. J., FLYNN, R. A., CHAVEZ, S. L., BAYLESS, N. L., WOSSIDLO, M., WESCHE, D. J., MARTIN, L., WARE, C. B., BLISH, C. A., CHANG, H. Y., PERA, R. A. R. AND WYSOCKA, J. (2015) Intrinsic retroviral reactivation in human preimplantation embryos and pluripotent cells., Nature, 522(7555), pp. 221–225. doi: 10.1038/nature14308.
LI, W., LEE, M.-H., HENDERSON, L., TYAGI, R., BACHANI, M., STEINER, J., CAMPANAC, E., HOFFMAN, D. A., VON GELDERN, G., JOHNSON, K., MARIC, D., MORRIS, H. D., LENTZ, M., PAK, K., MAMMEN, A., OSTROW, L., ROTHSTEIN, J. AND NATH, A. (2015) Human endogenous retrovirus-K contributes to motor neuron disease., Science translational medicine, 7(307), p. 307ra153. doi: 10.1126/scitranslmed.aac8201.
LI, X., KAHVECI, T. AND SETTLES, A. M. (2008) A novel genome-scale repeat finder geared towards transposons., Bioinformatics (Oxford, England). England, 24(4), pp. 468–476. doi: 10.1093/bioinformatics/btm613.