TY - JOUR
T1 - RSARF: Prediction of residue solvent accessibility from protein sequence using random forest method
AU - Ganesan, Pugalenthi
AU - Kandaswamy, Krishna Kumar Umar
AU - Chou -, Kuochen
AU - Vivekanandan, Saravanan
AU - Kolatkar, Prasanna R.
N1 - KAUST Repository Item: Exported on 2020-10-01
PY - 2012/1/1
Y1 - 2012/1/1
N2 - Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and supplementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/. - See more at: http://www.eurekaselect.com/89216/article#sthash.pwVGFUjq.dpuf
AB - Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and supplementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/. - See more at: http://www.eurekaselect.com/89216/article#sthash.pwVGFUjq.dpuf
UR - http://hdl.handle.net/10754/562051
UR - http://www.eurekaselect.com/openurl/content.php?genre=article&issn=0929-8665&volume=19&issue=1&spage=50
U2 - 10.2174/092986612798472875
DO - 10.2174/092986612798472875
M3 - Article
C2 - 21919860
SN - 0929-8665
VL - 19
SP - 50
EP - 56
JO - Protein & Peptide Letters
JF - Protein & Peptide Letters
IS - 1
ER -