Triplet Entropy Loss: Improving The Generalisation of Short Speech Language Iden
Triplet Entropy Loss: Improving The Generalisation of Short Speech Language Identification Systems
We present several methods to improve the generalisation of language identification (LID) systems to new speakers and to new domains. These methods involve Spectral augmentation, where spectrograms are masked in the frequency or time bands during training and CNN architectures that are pre-trained on the Imagenet dataset.The paper also introduces the novel Triplet Entropy Loss training method, which involves training a network simultaneously using Cross Entropy and Triplet loss. It was found that all three methods improved the generalisation of the models, though not significantly. Even though the models trained using Triplet Entropy Loss showed a better understanding of the languages and higher accuracies, it appears as though the models still memorise word patterns present in the spectrograms rather than learning the finer nuances of a language. The research shows that Triplet Entropy Loss has great potential and should be investigated further, not only in language identification tasks but any classification task.
三重熵损失:提高了短语音语言识别系统的通用性
我们提出了几种方法来提高语言识别(LID)系统对新说话者和新领域的通用性。这些方法涉及频谱增强,其中频谱图在训练和Imagenet数据集上预先训练的CNN架构期间的频率或时间段中被屏蔽。.. 本文还介绍了新颖的三重熵损失训练方法,该方法包括使用交叉熵和三重损失同时训练网络。发现这三种方法都改进了模型的泛化,尽管效果不显着。即使使用三重熵损失训练的模型显示出对语言的更好理解和更高的准确性,似乎这些模型仍然记住了频谱图中存在的单词模式,而不是学习语言的细微差别。研究表明,三重态熵损失具有巨大的潜力,不仅在语言识别任务中,而且在任何分类任务中,都应进一步研究。 (阅读更多)