TY - JOUR
T1 - Imbalance deep multi-instance learning for predicting isoform–isoform interactions
AU - Yu, Guoxian
AU - Zeng, Jie
AU - Wang, Jun
AU - Zhang, Hong
AU - Zhang, Xiangliang
AU - Guo, Maozu
N1 - KAUST Repository Item: Exported on 2021-03-01
Acknowledgements: This study is supported by Natural Science Foundation of China (61872300 and 62031003).
PY - 2021/2/25
Y1 - 2021/2/25
N2 - Multi-instance learning (MIL) can model complex bags (samples) that are further made of diverse instances (subsamples). In typical MIL, the labels of bags are known while those of individual instances are unknown and to be specified. In this paper we propose an imbalanced deep multi-instance learning approach (IDMIL-III) and apply it to predict genome-wide isoform–isoform interactions (IIIs). This prediction task is crucial for precisely understanding the interactome between proteoforms and to reveal their functional diversity. The current solutions typically formulate the prediction of IIIs as a MIL problem by pairing two genes as a “bag” and any two isoforms spliced from these two genes as “instances.” The key instances (interacting isoform pairs) trigger the label of the positive (interacting) gene bags, which is important for identifying the IIIs. Furthermore, the prediction task was simplified as a balanced classification problem, which in practice is a rather imbalanced one. To address these issues, IDMIL-III fuses RNA-seq, nucleotide sequence, amino acid sequence and exon array data, and further introduces a novel loss function to separately model the loss of positive pairs and of negative pairs, and thus to avoid the expected loss dominated by majority negative pairs. In addition, it includes an attention strategy to identify positive isoform pairs from a positive gene bag. Extensive experimental results prove the effectiveness of IDMIL-III on predicting IIIs. Particularly, IDMIL-III achieves an F1 value as 95.4%, at least 3.8% higher than those of competitive methods at the gene-level; and obtains an F1 as 29.8%, at least 2.4% higher than the state-of-the-art methods at the isoform-level. The code of IDMIL-III is available at http://mlda.swu.edu.cn/codes.php?name=IDMIL-III.
AB - Multi-instance learning (MIL) can model complex bags (samples) that are further made of diverse instances (subsamples). In typical MIL, the labels of bags are known while those of individual instances are unknown and to be specified. In this paper we propose an imbalanced deep multi-instance learning approach (IDMIL-III) and apply it to predict genome-wide isoform–isoform interactions (IIIs). This prediction task is crucial for precisely understanding the interactome between proteoforms and to reveal their functional diversity. The current solutions typically formulate the prediction of IIIs as a MIL problem by pairing two genes as a “bag” and any two isoforms spliced from these two genes as “instances.” The key instances (interacting isoform pairs) trigger the label of the positive (interacting) gene bags, which is important for identifying the IIIs. Furthermore, the prediction task was simplified as a balanced classification problem, which in practice is a rather imbalanced one. To address these issues, IDMIL-III fuses RNA-seq, nucleotide sequence, amino acid sequence and exon array data, and further introduces a novel loss function to separately model the loss of positive pairs and of negative pairs, and thus to avoid the expected loss dominated by majority negative pairs. In addition, it includes an attention strategy to identify positive isoform pairs from a positive gene bag. Extensive experimental results prove the effectiveness of IDMIL-III on predicting IIIs. Particularly, IDMIL-III achieves an F1 value as 95.4%, at least 3.8% higher than those of competitive methods at the gene-level; and obtains an F1 as 29.8%, at least 2.4% higher than the state-of-the-art methods at the isoform-level. The code of IDMIL-III is available at http://mlda.swu.edu.cn/codes.php?name=IDMIL-III.
UR - http://hdl.handle.net/10754/667695
UR - https://onlinelibrary.wiley.com/doi/10.1002/int.22402
U2 - 10.1002/int.22402
DO - 10.1002/int.22402
M3 - Article
SN - 0884-8173
JO - International Journal of Intelligent Systems
JF - International Journal of Intelligent Systems
ER -