Distribution of micropeptide-coding sORFs in transcripts

Xinqiang Yin Jialiang Hu Hanmei Xu

Citation:  Yin Xinqiang, Hu Jialiang, Xu Hanmei. Distribution of micropeptide-coding sORFs in transcripts[J]. Chinese Chemical Letters, 2018, 29(7): 1029-1032. doi: 10.1016/j.cclet.2018.04.027 shu

Distribution of micropeptide-coding sORFs in transcripts

English

  • A protein-codingopen readingframe (ORF)is comprised of a start codon, in-frame codons and a stop codon [1, 2]. Peptides, which are typically defined as fewer than 50 amino acids, are often obtained from the processing of longer precursors. However, hundreds of thousands of previously non-annotated short or small open reading frames (sORFs) of less than 100 codons, which have the potential to encode peptides or small proteins, have been discovered in genomes of many species. Products encoded by sORFs, less than 100 amino acids, are named micropeptides [3, 4]. Unlike classical bioactive peptides, micropeptides are immediately released in the cytoplasm because of the absence of an N-terminal signaling sequence [5].

    For many reasons, micropeptides have been missed for a long time. On the one hand, short coding sequences were excluded from the initial genome annotation strategy for the assumption that most of coding genes would code for longer than 100 amino acids proteins and therefore it is difficult to accurately identify bona fide proteincoding sORFs and distinguish them from the large number of putative noncoding ORFs. On the other hand, it's difficult to detect them for their small size and lower abundance. Furthermore, the use of alternative transcription start sites and process such as alternative splicing, transcript editing, and post-translational modification make the identification process even more challenging [6-8].

    Advanced bioinformatics and computational approaches have been successfully implemented to identify sORFs-encoded peptides [9-18]. In addition, some experimental techniques, such as ribosome profile[19-26], mass spectrometry and other proteomic methods [27-32], have been developed and fine-tuned to identify novel coding sORFs. Here, we present an overview of the wide distribution of the sORFs in transcripts (Fig. 1) and their functional roles in organisms. Since the aforementioned methods have been described in detail in many articles [9-32], we will not explore them in this review.

    Figure 1

    Figure 1.  Overview of the distribution of small open reading frames (sORFs) in various transcripts

    Based on the sequence length, non-coding RNAs (ncRNAs) can be generally divided into two classes: short ncRNAs (sncRNAs) with size of less than 200 nts and long ncRNAs (lncRNAs) with size of more than 200 nts. Advances in transcriptomics have led to the discovery of a great number of lncRNAs in genomes which play versatile roles in regulating gene expression [33].

    Although originally thought to be non-coding, recently studies have found that there are many sORFs in ncRNAs, and some ncRNAs even can encode micropeptides. Gene annotated as noncoding RNA 003 in 2L (pncr003:2L) encodes two functional micropeptides with 28 and 29 amino acids named SCL (Table 1). SCL regulates calcium transport and hence influences regular muscle contraction in the Drosophila heart [34].

    Table 1

    Table 1.  The sequences of the peptides discussed in the text
    DownLoad: CSV

    Olson's team has found five micropeptides in mice transcripts. Two of them are encoded by lncRNAs. A highly conserved 46 amino acids micropeptides named myoregulin (MLN) (Table 1), which is encoded by a skeletal muscle-specific RNA annotated as lncRNA, interacts directly with sarcoplasmic reticulum Ca2+-ATPase (SERCA) and impedes Ca2+ uptake into the sarcoplasmic reticulum hence regulates muscle contraction [35]. Another 34 amino acids micropeptide DWORF, encoded by a putative muscle-specific long noncoding RNA, enhances SERCA activity by displacing the SERCA inhibitors, phospholamban (PLN), sarcolipin (SLN), and myoregulin (MLN) [36].

    Toddler, a gene previously annotated as non-coding RNA in vertebrates, encodes a 58 amino acids micropeptide which is an activator of APJ/Apelin receptor signaling and promotes gastrulation movements [37]. LncRNA LINC00961, conserved across species, encodes the 90-amino acid polypeptide SPAR, which regulates mTORC1 activation and promotes muscle regeneration [38]. A recent study showed that NoBody, a conserved micropeptide encoded by LINC01420/LOC550643 RNA, interacts with mRNA decapping protein via direct interactions with EDC4 [39]. These examples underscore the likelihood that many transcripts currently annotated as noncoding RNAs encode peptides with important biological functions.

    2.2.1   50-UTRs

    Small ORFs present in the 50 untranslated region of mRNAs are named upstream ORFs or uORFs. It's a big challenge to predict uORFs by using sequence-based methods, since nearly half of these uORFs use non-AUG in mammals. But ribosome profiling, a powerful technique, can detect various start codons directly by halting the ribosome in the start site [40, 41]. For a long time, these uORFs were considered to be cis-acting elements regulating the translation of downstream ORFs [42]. Recent studies have demonstrated that nearly 50% of uORFs in human mRNAs are translated and translation is necessary to regulate downstream ORF expression [43, 44]. In general, uORFs can reduce protein expression of downstream through modulating translation efficacy [45] or by triggering mRNA decay [46, 47]. Under stress conditions, however, uORFs facilitate protein expression [48].

    Some uORFs-encoded peptides also have biological functions. An uORF containing 31 codons in the mRNA for the mammalian gene chop can encode a 31-amino acid peptide. This uORF peptide reduces CHOP protein translation through interacting with the peptide exit tunnel on the ribosome to pause or disassociate the ribosome from the mRNA thereby disturbing the expression of the chop gene [49]. Another example is the MKKS gene, which generates two types of transcripts: a long transcript that encodes both uORFs and MKKS, and a short transcript that encodes only uORFs by using alternative polyadenylation sites at the 50-UTR. Multiple uORFs of the MKKS long transcript function as translational repressor for MKKS. Two encoded products of uORFs are imported onto the mitochondrial membrane, but their function needs further study [50]. One more example is a 50-upstream short open reading frame encoded peptide, which regulates angiotensin type 1a receptor production and signals via the β-arrestin pathway [51]. All the examples suggest that uORFs may treasure a source of peptides that play key roles in cells.

    2.2.2   Overlapping and downstream sORFs

    Mature mRNAs contain unconventional open reading frames also located in overlapping the reference ORFs in non-canonical +2 and +3 reading frames and thus a single mRNA can yield more than one completely different peptides [52, 53]. Around 41% of human mRNAs contain at least one alternative ORF, most of which encode small proteins of less than 90 amino acids, within the reference ORF [52]. Overlapping sORFs may lie within a known ORF or extend from the known ORF into the 30 trailer sequence. These overlapping sORFs represent another source of alternatively translated products. Eighty short peptides that are encoded by overlapping sORFs have been identified by proteomic studies [52, 54]. Two characterized polypeptides, AltPrP [55] and AltATXN1 [56], are also encoded by overlapping sORFs.

    Compared to 50-UTRs, 30-UTRs seem to attract less attention, since people think that 30-UTRs cannot be translated. However, the translation of peptides from downstream sORFs is supported by mass spectrometry and other algorithms and some peptides from downstream sORFs have already been identified by several studies [52, 54]. One study has revealed that AltMRVI1, encoded by a sORF in the 30-UTR of the gene MRVI1, co-localizes and interacts with BRCA1, but its role is still unknown [52].

    Circular RNAs (circRNAs) are produced through a non-canonical alternative splicing and form covalently closed RNA circles [57]. For lacking the structures that are critical for efficient translation initiation, people think that circRNAs are not protein encoding. They are conserved across species and enriched in the nervous system [58]. Many studies have suggested that this new class of RNAs function in a wide range, including mediating mRNA expression, protein sequestration, transcriptional regulation, and have potential roles in some diseases [57-62].

    However, experimental evidence reveals that circRNAs also have translation potential and even few functional products encoded by circRNAs have been identified (Fig. 2). One example is the translation of circMbl [63]. A subset of translating ribosomesassociated circRNAs have been identified by performing ribosome footprinting from fly heads. CircMbl3, a protein encoded by a circRNA generated from the gene muscleblind (Mbl), was detected by mass spectrometry. Further study showed that ribo-circRNAs allow cap-independent translation and that starvation and FOXO likely regulate the translation of a circMbl isoform. The identifiable domains in many ribo-circRNAs-encoded proteins indicate their functions. Another example is the translation of circ-ZNF609 [64]. Circ-ZNF609, which contains an open reading frame spanning from the start codon with the linear transcript, and terminating at an inframe stop codon, and controls myoblast proliferation, is associated with heavy polysomes, and encodes a protein in splicing-dependent and cap-independent manner [64].

    Figure 2

    Figure 2.  Translation of circular RNAs

    As small regulatory RNA molecules, miRNAs can inhibit the expression of specific target genes through binding to and cleaving their mRNAs or otherwise inhibiting their translation into proteins. Since primary transcripts (pri-miRNAs) are the precursor of miRNAs and have the same feature as mRNAs produced by Pol Ⅱ, it is possible that they also encode proteins. Pri-miR171b and primiR165a containing sORFs can encode regulatory peptides, miPEP171b and miPEP 165a, respectively [65]. Both peptides can enhance the accumulation of their corresponding mature miRNAs and lead to the down regulation of target genes involved in root development [65]. Five other active miPEPs encoded by primiRNAs of A. thaliana and M. truncatula have been found, which suggests that miPEPs are widespread in plants [65]. But whether sORFs present in animals is still unknown.

    Two studies have identified that mitochondrial ribosomal RNAs encode functional peptides and confirmed that mitochondrial ribosomal RNAs have coding potential [66, 67]. Humanin, a 24- amino-acids polypeptide that is highly conserved across species, was found to be encoded in mitochondrial 16S rRNA [66]. This peptide functions in a variety of biological processes such as cell survival, apoptosis, inflammatory response, substrate metabolism, oxidative stress, and starvation [66, 68-70]. Another mitochondrial-derived peptide is MOTS-c that is encoded by mitochondrial rRNA. It can promote metabolic homeostasis and reduce obesity and insulin resistance [67].

    sORFs have been found in various transcripts, and some sORFencoded functional peptides have also been identified in what seem to be non-coding transcripts in several organisms. Since ribosome profilings have demonstrated the coding potential of thousands of previously annotated as non-coding RNAs, these functional peptides could be just the tip of the iceberg. It is time to pay special attention to the new source of peptides. It is a big challenge for scientists to study small peptides to discover and characterize all the short peptides. Are there any sORFs in other RNAs? How many sORFs are actually translated? What are the functions of these small peptides? These questions remain to be answered.

    This work was supported by the Project Program of State Key Laboratory of Natural Medicines (No. SKLNMBZ201403) and the National Science and Technology Major Projects of New Drugs (Nos. 2012ZX09103301-004 and 2014ZX09508007) in China. This project was also funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

    1. [1]

      M.A. Basrai, P. Hieter, J.D. Boeke, Genome Res. 7(1997) 768-771. doi: 10.1101/gr.7.8.768

    2. [2]

      M.A. Mumtaz, J.P. Couso, Biochem. Soc. Trans. 43(2015) 1271-1276. doi: 10.1042/BST20150170

    3. [3]

      M.I. Galindo, J.I. Pueyo, S. Fouix, S.A. Bishop, J.P. Couso, PLoS Biol. 5(2007) e106. doi: 10.1371/journal.pbio.0050106

    4. [4]

      Y. Hashimoto, T. Kondo, Y. Kageyama, Dev. Growth Differ. 50(2008) S269-S276. doi: 10.1111/j.1440-169X.2008.00994.x

    5. [5]

      J. Crappe, W.V. Criekinge, G. Menschaert, EuPA Open Proteom. 3(2014) 128-137. doi: 10.1016/j.euprot.2014.02.006

    6. [6]

      I.P. Ivanov, A.E. Firth, A.M. Michel, et al., Nucleic Acids Res. 39(2011) 4220-4234. doi: 10.1093/nar/gkr007

    7. [7]

      G. Menschaert, W. Van Criekinge, T. Notelaers, et al., Mol. Cell Proteom. 12(2013) 1780-1790. doi: 10.1074/mcp.M113.027540

    8. [8]

      S.J. Andrews, J.A. Rothnagel, Nat. Rev. Genet. 15(2014) 193-204. doi: 10.1038/nrg3520

    9. [9]

      L. Kong, Y. Zhang, Z.Q. Ye, et al., Nucleic Acids Res. 35(2007) W345-W349. doi: 10.1093/nar/gkm391

    10. [10]

      Y. Ina, J. Mol. Evol. 40(1997) 190-226. doi: 10.1007%2FBF00167113

    11. [11]

      L.D. Hurst, Trends Genet. 18(2002) 486. doi: 10.1016/S0168-9525(02)02722-1

    12. [12]

      M.F. Lin, J.W. Carlson, M.A. Crosby, et al., Genome Res. 17(2007) 1823-1836. doi: 10.1101/gr.6679507

    13. [13]

      A. Stark, M.F. Lin, P. Kheradpour, et al., Nature 450(2007) 219-232. doi: 10.1038/nature06340

    14. [14]

      G. Butler, M.D. Rasmussen, M.F. Lin, et al., Nature 459(2007) 657-662. doi: 10.1007%2Fs12275-011-1064-7

    15. [15]

      M. Clamp, B. Fry, M. Kamal, et al., Proc. Natl. Acad. Sci. U. S. A. 104(2007) 19428-19433. doi: 10.1073/pnas.0709013104

    16. [16]

      M. Guttman, I. Amit, M. Garber, et al., Nature 458(2009) 223-227. doi: 10.1038/nature07672

    17. [17]

      M. Guttman, M. Garber, J.Z. Levin, et al., Nat. Biotechnol. 28(2010) 503-510. doi: 10.1038/nbt.1633

    18. [18]

      M.F. Lin, I. Jungreis, M. Kellis, Bioinformatics 27(2011) i275-i282. doi: 10.1093/bioinformatics/btr209

    19. [19]

      N.T. Ingolia, S. Ghaemmaghami, J.R. Newman, et al., Science 324(2009) 218-223. doi: 10.1126/science.1168978

    20. [20]

      S. Lee, B. Liu, S.X. Huang, et al., Proc. Natl. Acad. Sci. U. S. A. 109(2012) E2424-E2432. doi: 10.1073/pnas.1207846109

    21. [21]

      N.T. Ingolia, G.A. Brar, S. Rouskin, et al., Nat. Protoc. 7(2012) 1534-1550. doi: 10.1038/nprot.2012.086

    22. [22]

      S. Iwasaki, N.T. Ingolia, Trends Biochem. Sci. 42(2017) 612-624. doi: 10.1016/j.tibs.2017.05.004

    23. [23]

      M.V. Gerashchenko, V.N. Gladyshev, Nucleic Acids Res. 45(2017) e6. doi: 10.1093/nar/gkw822

    24. [24]

      M. Guttman, P. Russell, N.T. Ingolia, et al., Cell 154(2013) 240-251. doi: 10.1016/j.cell.2013.06.009

    25. [25]

      N.T. Ingolia, G.A. Brar, N. Stern-Ginossar, et al., Cell Rep. 8(2014) 1365-1379. doi: 10.1016/j.celrep.2014.07.045

    26. [26]

      A.A. Bazzini, T.G. Johnstone, R. Christiano, et al., EMBO J. 33(2014) 981-993. doi: 10.1002/embj.201488411

    27. [27]

      J. Crappé, E. Ndah, A. Koch, et al., Nucleic Acids Res. 43(2015) e29. doi: 10.1093/nar/gku1283

    28. [28]

      L. Calviello, N. Mukherjee, E. Wyler, et al., Nat. Methods 13(2016) 165-170. doi: 10.1038/nmeth.3688

    29. [29]

      J.L. Aspden, Y.C. Eyre-Walker, R.J. Phillips, et al., Elife 3(2014) e03528. http://europepmc.org/articles/PMC4612599

    30. [30]

      S.A. Slavoff, A.J. Mitchell, A.G. Schwaid, et al., Nat. Chem. Biol. 9(2013) 59-64. doi: 10.1038/nchembio.1120

    31. [31]

      Q. Chu, J. Ma, A. Saghatelian, Crit. Rev. Biochem. Mol. Biol. 50(2015) 134-141. doi: 10.3109/10409238.2015.1016215

    32. [32]

      J.A. Vizcaino, A. Csordas, N. Del-Toro, et al., Nucleic Acids Res. 44(2016) 11033. doi: 10.1093/nar/gkw880

    33. [33]

      T.T. Cech, J.A. Steitz, Cell 157(2014) 77-94. doi: 10.1016/j.cell.2014.03.008

    34. [34]

      E.G. Magny, J.I. Pueyo, F.M. Pearl, et al., Science 341(2013) 1116-1120. doi: 10.1126/science.1238802

    35. [35]

      D.M. Anderson, K.M. Anderson, C.L. Chang, et al., Cell 160(2015) 595-606. doi: 10.1016/j.cell.2015.01.009

    36. [36]

      B.R. Nelson, C.A. Makarewich, D.M. Anderson, et al., Science 351(2016) 271-275. doi: 10.1126/science.aad4076

    37. [37]

      A. Pauli, M.L. Norris, E. Valen, et al., Science 343(2014) 1248636. doi: 10.1126/science.1248636

    38. [38]

      G. Menschaert, W. Van Criekinge, T. Notelaers, et al., Mol. Cell. Proteom. 12(2013) 1780-1790. doi: 10.1074/mcp.M113.027540

    39. [39]

      N.G. D'Lima, J. Ma, L. Winkler, et al., Nat. Chem. Biol. 13(2017) 174-180. doi: 10.1038/nchembio.2249

    40. [40]

      A.M. Michel, D.E. Andreev, P.V. Baranov, BMC Bioinform. 15(2014) 380. doi: 10.1186/s12859-014-0380-4

    41. [41]

      A. Matsumoto, A. Pasut, M. Matsumoto, et al., Nature 541(2017) 228-232. doi: 10.1038/nature21034

    42. [42]

      S.E. Calvo, D.J. Pagliarini, V.K. Mootha, PNAS 106(2009) 7507-7512. doi: 10.1073/pnas.0810916106

    43. [43]

      L.E. Cabrera-Quio, S. Herberg, A. Pauli, RNA Biol. 13(2016) 1051-1059. doi: 10.1080/15476286.2016.1218589

    44. [44]

      Y. Ye, Y. Liang, Q. Yu, et al., Hum. Genet. 134(2015) 605-612. doi: 10.1007/s00439-015-1544-7

    45. [45]

      S.E. Calvo, D.J. Pagliarini, V.K. Mootha, PNAS 106(2009) 7507-7512. doi: 10.1073/pnas.0810916106

    46. [46]

      J.T. Mendell, N.A. Sharifi, J.L. Meyers, et al., Nat. Genet. 36(2004) 1073-1078. doi: 10.1038/ng1429

    47. [47]

      H. Yepiskoposyan, F. Aeschimann, D. Nilsson, et al., RNA 17(2011) 2108-2118. doi: 10.1261/rna.030247.111

    48. [48]

      K.A. Spriggs, M. Bushell, A.E. Willis, Mol. Cell 40(2010) 228-237. doi: 10.1016/j.molcel.2010.09.028

    49. [49]

      C. Jousse, et al., Nucleic Acids Res. 29(2001) 4341-4351. doi: 10.1093/nar/29.21.4341

    50. [50]

      C. Akimoto, E. Sakashita, K. Kasashima, et al., Biochim. Biophs. Acta1830(2013) 2728-2738. doi: 10.1016/j.bbagen.2012.12.010

    51. [51]

      G.L. Yosten, J. Liu, H. Ji, et al., J. Physiol. 594(2016) 1601-1605. doi: 10.1113/JP270567

    52. [52]

      B. Vanderperre, J.F. Lucier, C. Bissonnette, et al., PLoS One 8(2013) e70698. doi: 10.1371/journal.pone.0070698

    53. [53]

      H. Mouilleron, V. Delcourt, X. Roucou, Nucleic Acids Res. 44(2016) 14-23. doi: 10.1093/nar/gkv1218

    54. [54]

      S.A. Slavoff, et al., Nature Chem. Biol. 9(2013) 59-64. doi: 10.1038/nchembio.1120

    55. [55]

      B. Vanderperre, et al., FASEB J. 25(2011) 2373-2386. doi: 10.1096/fj.10-173815

    56. [56]

      D. Bergeron, et al., J. Biol. Chem. 288(2013) 21824-21835. doi: 10.1074/jbc.M113.472654

    57. [57]

      L.J. Li, Q. Huang, H.F. Pan, et al., Exp. Cell Res. 346(2016) 248-254. doi: 10.1016/j.yexcr.2016.07.021

    58. [58]

      D. van Rossum, B.M. Verheijen, R.J. Pasterkamp, Front. Mol. Neurosci. 9(2016) 74. https://www.helmholtz-muenchen.de/ihg/publications/index.html

    59. [59]

      M. Cortés-López, P. Miura, Yale J. Biol. Med. 89(2016) 527-537. https://www.ncbi.nlm.nih.gov/labs/journals/yale-j-biol-med/

    60. [60]

      D. Rong, H. Sun, Z. Li, et al., Oncotarget 8(2017) 73271-73281.

    61. [61]

      S. Qu, Z. Liu, X. Yang, Cancer Lett. 414(2018) 301-309. doi: 10.1016/j.canlet.2017.11.022

    62. [62]

      M.M. Jiang, Z.T. Mai, S.Z. Wan, et al., J. Cancer Res. Clin. Oncol. 144(2018) 667-674. doi: 10.1007/s00432-017-2576-2

    63. [63]

      N.R. Pamudurti, O. Bartok, M. Jens, et al., Mol. Cell 66(2017) 9-21. doi: 10.1016/j.molcel.2017.02.021

    64. [64]

      I. Legnini, G. Di Timoteo, F. Rossi, et al., Mol. Cell 66(2017) 22-37. doi: 10.1016/j.molcel.2017.02.017

    65. [65]

      D. Lauressergues, J.M. Couzigou, H.S. Clemente, et al., Nature 520(2015) 90-93. doi: 10.1038/nature14346

    66. [66]

      Y. Hashimoto, T. Niikura, H. Tajima, et al., Proc. Natl. Acad. Sci. U. S. A. 98(2001) 6336-6341. doi: 10.1073/pnas.101133498

    67. [67]

      C. Lee, J. Zeng, G.B. Drew, et al., Cell Metab. 21(2015) 443-454. doi: 10.1016/j.cmet.2015.02.009

    68. [68]

      B. Guo, D. Zhai, E. Cabezas, et al., Nature 423(2003) 456-461. doi: 10.1038/nature01627

    69. [69]

      D. Zhai, F. Luciano, X. Zhu, et al., J. Biol. Chem. 280(2005) 15815-15824. doi: 10.1074/jbc.M411902200

    70. [70]

      C. Lee, K. Yen, P. Cohen, et al., Trends Endocrinol. Metab. 24(2013) 222-228. doi: 10.1016/j.tem.2013.01.005

  • Figure 1  Overview of the distribution of small open reading frames (sORFs) in various transcripts

    Figure 2  Translation of circular RNAs

    Table 1.  The sequences of the peptides discussed in the text

    下载: 导出CSV
  • 加载中
计量
  • PDF下载量:  1
  • 文章访问数:  3200
  • HTML全文浏览量:  170
文章相关
  • 发布日期:  2018-07-22
  • 收稿日期:  2018-03-19
  • 接受日期:  2018-04-09
  • 修回日期:  2018-03-30
  • 网络出版日期:  2018-07-24
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

/

返回文章