Hwang. et. al., J Proteome Res. 2017 Dec 1;16(12):4425-4434. doi: 10.1021/acs.jproteome.7b00223. Epub 2017 Oct 13.
Next Generation Proteomic Pipeline for Chromosome-based Proteomic Research Using NeXtProt and GENCODE databases
Hwang H1, Park GW1, Park JY1,2, Lee HK1,2, Lee JY1, Jeong JE1,2, Park SR3, Yates JR 3rd3, Kwon KH1, Park YM4, Lee HJ5, Paik YK5, Kim JY1, Yoo JS1,2.
2 Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon, Republic of Korea.
3 Department of Chemical Physiology, The Scripps Research Institute , La Jolla, California 92037, United States.
4 Center for Cognition and Sociality, Institute for Basic Science , Daejeon, Republic of Korea.
5 Yonsei Proteome Research Center and Department of Integrated OMICS for Biomedical Science, and Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University , Seoul, Republic of Korea.
Human Proteome Project aims to map all human proteins including missing proteins as well as proteoforms with post translational modifications, alternative splicing variants (ASVs), and single amino acid variants (SAAVs). neXtProt and Ensemble databases are usually used to provide curated information on human coding genes. However, to find these proteoforms, we (Chr #11 team) first introduce a streamlined pipeline using customized and concatenated neXtProt and GENCODE originated from Ensemble, with controlled false discovery rate (FDR). Because of large sized databases used in this pipeline, we found more stringent FDR filtering (0.1% at the peptide level and 1% at the protein level) to claim novel findings, such as GENCODE ASVs and missing proteins, from human hippocampus data set (MSV000081385) and ProteomeXchange (PXD007166). Using our next generation proteomic pipeline (nextPP) with neXtProt and GENCODE databases, two missing proteins such as activity-regulated cytoskeleton-associated protein (ARC, Chr 8) and glutamate receptor ionotropic, kainite 5 (GRIK5, Chr 19) were additionally identified with two or more unique peptides from human brain tissues. Additionally, by applying the pipeline to human brain related data sets such as cortex (PXD000067 and PXD000561), spinal cord, and fetal brain (PXD000561), seven GENCODE ASVs such as ACTN4-012 (Chr.19), DPYSL2-005 (Chr.8), MPRIP-003 (Chr.17), NCAM1-013 (Chr.11), EPB41L1-017 (Chr.20), AGAP1-004 (Chr.2), and CPNE5-005 (Chr.6) were identified from two or more data sets. The identified peptides of GENCODE ASVs were mapped onto novel exon insertions, alternative translations at 5′-untranslated region, or novel protein coding sequence. Applying the pipeline to male reproductive organ related data sets, 52 GENCODE ASVs were identified from two testis (PXD000561 and PXD002179) and a spermatozoa (PXD003947) data sets. Four out of 52 GENCODE ASVs such as RAB11FIP5-008 (Chr. 2), RP13-347D8.7-001 (Chr. X), PRDX4-002 (Chr. X), and RP11-666A8.13-001 (Chr. 17) were identified in all of the three samples.