Abstract:In the field of data-based fault diagnosis, correct label samples are the guarantee of diagnostic accuracy. but for unavoidable reasons, training samples are often disturbed by noise labels. In response to this problem, a noise label correction method based on improved Stacked Auto-Encoder is proposed. This method assigns pseudo-labels to samples through Stacked Auto-Encoder and Isolation Forest, adjusts the degree of attention of Stacked Auto-Encoder to samples, thereby making the Stacked Auto-Encoder’s focus on the correct samples. Considering the deviation caused by the data distribution, the cross-validation based on random forest is used to obtain the sample entropy of the sample to correct the label. The gear and bearing experiments show that the method can reduce the noise label rate of the sample, correct the noise label correctly, and improve the accuracy of fault classification under multiple noise ratios.
[1] Liu J, Song C, Zhao J, et al. Manifold-preserving sparse graph based ensemble FDA for industrial label-noise fault classification[J]. IEEE Transactions on Instrumentation and Measurement, IEEE, 2020, 69(6): 2621–2634.
[2] Quinlan J R. Induction of decision trees[J]. Machine Learning, 1986, 1(1): 81–106.
[3] 宫辰,张闯,王启舟.标签噪声鲁棒学习算法研究综述[J].航空兵器,2020,27(03):20-26.
GONG Chen, ZHANG Chuang, WANG Qizhou. A Survey of Label Noise Robust Learning Algorithms[J]. Aero Weaponry, 2020,27(03):20-26.
[4] Zhang C, Recht B, Bengio S, et al. Understanding deep learning requires rethinking generalization[C]. 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings, 2019.
[5] 罗俊杰,孙江文,王崇骏,等. 基于Bayes的有噪训练集去噪方法研究[J].计算机科学,2008,35(09):213-216.
LUO Junjie, SUN Jiangwen, WANG Chongjun, et al. Identifying and Correcting Mislabled Training Instances Using Bayes[J]. Computer Science, 2008, 35(9): 213–216.
[6] 高琼. 分类问题中的标签噪声研究[D]. 西安电子科技大学, 2019.
GAO Qiong. Study on Label Noise in the Classification[D] Xi Dian university, 2019
[7] 夏建明, 杨俊安. 基于稀疏流形聚类嵌入模型和L1范数正则化的标签错误检测[J]. 控制与决策, 2014, 29(6): 1103–1108.
XIA Jianming, YANG Junan. Labeling errors detecting and correcting algorithm based on sparse manifold clustering and embedding and L1 norm regularization[J], Control and Decision, 2014, 29(6): 1103–1108.
[8] Liu T, Tao D. Classification with Noisy Labels by Importance Reweighting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE, 2016, 38(3): 447–461.
[9] Caron M, Bojanowski P, Joulin A, et al. Deep clustering for unsupervised learning of visual features[C]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, 11218 LNCS: 139–156.
[10] Xiao J, Tian Y, Xie L, et al. A Hybrid Classification Framework Based on Clustering[J]. IEEE Transactions on Industrial Informatics, 2020, 16(4): 2177–2188.
[11] 刘艺. 基于知识图谱的海量数据错误标签的纠正[D]. 上海交通大学, 2014.
LIU Yi. A knowledge based approach for tackling mislabeled multi-class big social data[D], Shanghai Jiao Tong University, 2014.
[12] Jiang L, Zhou Z, Leung T, et al. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels[C]. 35th International Conference on Machine Learning, ICML 2018, 2018, 5: 3601–3620.
[13] Han B, Yao Q, Yu X, et al. Co-teaching: Robust training of deep neural networks with extremely noisy labels[C]. Advances in Neural Information Processing Systems, 2018, 2018-Decem (NeurIPS): 8527–8537.
[14] Guo S, Huang W, Zhang H, et al. CurriculumNet: Weakly supervised learning from large-scale web images[C]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, 11214 LNCS: 139–154.
[15] Cao Z, Yang G, Chen Q, et al. Breast tumor classification through learning from noisy labeled ultrasound images[J]. Medical Physics, 2020, 47(3): 1048–1057.
[16] Hinton G E Z R S. Autoencoders, minimum description length and Helmholtz free energy[C]. Advances in neural information processing systems. 1994: 3–10.
[17] Liu F T, Ting K M, Zhou Z H. Isolation forest[C]. Proceedings - IEEE International Conference on Data Mining, ICDM, IEEE, 2008: 413–422.
[18] 陈仁祥,吴昊年,韩彦峰,等. 融合无量纲指标与信息熵的不同转速下旋转机械故障诊断[J]. 振动与冲击, 2019, 38(11): 219–227.
CHEN Renxiang, WU Haonian, HAN Yanfeng, el al. Rotating machinery fault diagnosis under different rotating speeds based on fusion of non-dimensional index and information entropy[J]. Journal of Vibration and Shock, 2019, 38(11): 219–227.
[19] 胡茑庆,陈徽鹏,程哲,等. 基于经验模态分解和深度卷积神经网络的行星齿轮箱故障诊断方法[J]. 机械工程学报, 2019, 55(7):9-18
HU Xiaoqing, CHEN huipeng, CHENG Zhe et al. Fault Diagnosis for Planetary Gearbox Based on EMD and Deep Convolutional Neural Networks[J]. Journal of Mechanical Engineering, 2019, 55(7):9-18
[20] Sanchez R V, Lucero P, Macancela J C, et al. Gear Crack Level Classification by Using KNN and Time-Domain Features from Acoustic Emission Signals under Different Motor Speeds and Loads[C]. Proceedings - 2018 International Conference on Sensing, Diagnostics, Prognostics, and Control, SDPC 2018, 2019(March 2019): 465–470.
[21] 赵帅,黄亦翔,王浩任,等. 基于随机森林与主成分分析的刀具磨损评估[J]. 机械工程学报, 2017, 53(21): 181–189.
ZHAO Shuai, HUANG Yixiang, WANG Haoren el al. Random Forest and Principle Components Analysis Based on Health Assessment Methodology for Tool Wear[J]. Journal of Mechanical Engineering, 2017, 53(21): 181–189.
[22] 李亚,黄亦翔,赵路杰,等. 基于t分布邻域嵌入与XGBoost的刀具多工况磨损评估[J]. 机械工程学报, 2020, 56(01): 132–140.
LI Ya, HUANG Yixiang, ZHAO Lujie et al.Multi-condition Wear Evaluation of Tool Based on T-SNE and XGBoost[J]. Journal of Mechanical Engineering, 2020, 56(01): 132–140.