自然语言处理学术速递

https://www.linglab.cn/news/27462021年06月03日

自然语言处理学术速递

cs.CL 方向，今日共计22篇

Transformer(1篇)

【1】 Classifying Long Clinical Documents with Pre-trained Transformers
标题：使用预先训练的变形金刚对长篇临床文档进行分类

作者：Xin Su,Timothy Miller,Xiyu Ding,Majid Afshar,Dmitriy Dligach
机构：University of Arizona, Boston Children’s Hospital and Harvard Medical School, University of Wisconsin–Madison, Loyola University Chicago
链接：https://arxiv.org/abs/2105.06752

摘要：大规模数据集的提出促进了新闻摘要深层神经模型的研究。深度学习还可能对口语对话摘要有用，这有助于一系列实际场景，包括客户服务管理和药物跟踪。为此，我们提出了DialSumm，一个大规模的有标签的对话摘要数据集。我们使用最先进的神经摘要器对DialSumm进行了实证分析。实验结果表明，对话摘要在口语术语、特殊的语篇结构、共指和省略、语用学和社会常识等方面面临着独特的挑战，这些都需要特定的表征学习技术来更好地应对。
摘要：Automatic phenotyping is a task of identifying cohorts of patients that match a predefined set of criteria. Phenotyping typically involves classifying long clinical documents that contain thousands of tokens. At the same time, recent state-of-art transformer-based pre-trained language models limit the input to a few hundred tokens (e.g. 512 tokens for BERT). We evaluate several strategies for incorporating pre-trained sentence encoders into document-level representations of clinical text, and find that hierarchical transformers without pre-training are competitive with task pre-trained models.

BERT(2篇)

【1】 BERT Busters: Outlier LayerNorm Dimensions that Disrupt BERT
标题：伯特·巴斯特：扰乱伯特的离群层范数维度

作者：Olga Kovaleva,Saurabh Kulshreshtha,Anna Rogers,Anna Rumshisky
机构：Department of Computer Science, University of Massachusetts Lowell, Center for Social Data Science, University of Copenhagen
备注：Accepted as long paper at Findings of ACL 2021
链接：https://arxiv.org/abs/2105.06990

摘要：我们生活在一个重要的时代。科学界拥有一个宇宙信使的兵工厂，可以对宇宙进行前所未有的详细研究。引力波、电磁波、中微子和宇宙射线涵盖了广泛的波长和时间尺度。结合和处理这些在数量、速度和维度上各不相同的数据集需要新的仪器协调模式、资金筹措模式和国际合作模式以及专门的人力和技术基础设施。随着大规模科学设施的出现，过去十年在计算和信号处理算法方面经历了前所未有的变革。图形处理单元、深度学习和开源高质量数据集的可用性的结合，推动了人工智能的兴起。这场数字革命现在推动了一个价值数十亿美元的产业，对技术和社会产生了深远的影响。在这一章中，我们描述了开创性的努力，以适应人工智能算法，以解决计算的巨大挑战，在多信使天体物理学。我们回顾了这些破坏性算法的快速发展，从2017年初推出的第一类算法，到如今将领域专业知识融入其架构设计和优化方案的复杂算法。我们讨论了科学可视化和极端规模计算在减少洞察时间和从模型和数据之间的相互作用中获得新知识方面的重要性。
摘要：Multiple studies have shown that BERT is remarkably robust to pruning, yet few if any of its components retain high importance across downstream tasks. Contrary to this received wisdom, we demonstrate that pre-trained Transformer encoders are surprisingly fragile to the removal of a very small number of scaling factors and biases in the output layer normalization (<0.0001% of model weights). These are high-magnitude normalization parameters that emerge early in pre-training and show up consistently in the same dimensional position throughout the model. They are present in all six models of BERT family that we examined and removing them significantly degrades both the MLM perplexity and the downstream task performance. Our results suggest that layer normalization plays a much more important role than usually assumed.

【2】 Distilling BERT for low complexity network training
标题：用于低复杂度网络训练的BERT提取

作者：Bansidhar Mangalwedhekar
链接：https://arxiv.org/abs/2105.06514

摘要：利用SST-2数据集上的情感分析，研究了将BERT学习转化为低复杂度模型BiLSTM、带注意的BiLSTM和浅层CNNs的效率。本文还比较了BERT模型与这些低复杂度模型的推理复杂度，并强调了这些技术在边缘设备（如手机、平板电脑和Raspberry-Pi等MCU开发板）上实现高性能NLP模型以及实现令人兴奋的新应用方面的重要性。
摘要：This paper studies the efficiency of transferring BERT learnings to low complexity models like BiLSTM, BiLSTM with attention and shallow CNNs using sentiment analysis on SST-2 dataset. It also compares the complexity of inference of the BERT model with these lower complexity models and underlines the importance of these techniques in enabling high performance NLP models on edge devices like mobiles, tablets and MCU development boards like Raspberry Pi etc. and enabling exciting new applications.

QA|VQA|问答|对话(1篇)

【1】 QAConv: Question Answering on Informative Conversations
标题：QAConv：信息性对话的问答

作者：Chien-Sheng Wu,Andrea Madotto,Wenhao Liu,Pascale Fung,Caiming Xiong
机构：†Salesforce AI Research, ‡The Hong Kong University of Science and Technology
备注：Data and code are available at this https URL
链接：https://arxiv.org/abs/2105.06912

摘要：本文介绍了一种新的问答数据集qacev，它利用会话作为知识源。我们专注于信息交流，包括商务邮件、小组讨论和工作渠道。与开放领域和面向任务的对话不同，这些对话通常是长的、复杂的、异步的，并且涉及到很强的领域知识。总的来说，我们收集了34204对问答，包括基于广度的、自由形式的和无法回答的问题，从10259个选择的对话中，包括人类书面和机器生成的问题。我们将长对话分段，并使用问题生成器和对话摘要器作为辅助工具来收集多跳问题。数据集有两种测试场景，chunk模式和full模式，这取决于固定的chunk是提供的还是从大型会话池中检索的。实验结果表明，在现有QA数据集上训练的最新QA系统具有有限的零射击能力，并且倾向于预测我们的问题是无法回答的。在我们的语料库上对这样的系统进行微调可以分别在块模式和全模式下获得23.6%和13.6%的显著改善。
摘要：This paper introduces QAConv, a new question answering (QA) dataset that uses conversations as a knowledge source. We focus on informative conversations including business emails, panel discussions, and work channels. Unlike open-domain and task-oriented dialogues, these conversations are usually long, complex, asynchronous, and involve strong domain knowledge. In total, we collect 34,204 QA pairs, including span-based, free-form, and unanswerable questions, from 10,259 selected conversations with both human-written and machine-generated questions. We segment long conversations into chunks, and use a question generator and dialogue summarizer as auxiliary tools to collect multi-hop questions. The dataset has two testing scenarios, chunk mode and full mode, depending on whether the grounded chunk is provided or retrieved from a large conversational pool. Experimental results show that state-of-the-art QA systems trained on existing QA datasets have limited zero-shot ability and tend to predict our questions as unanswerable. Fine-tuning such systems on our corpus can achieve significant improvement up to 23.6% and 13.6% in both chunk mode and full mode, respectively.

机器翻译(2篇)

【1】 Do Context-Aware Translation Models Pay the Right Attention?
标题：语境感知翻译模式是否得到了应有的重视？

作者：Kayo Yin,Patrick Fernandes,Danish Pruthi,Aditi Chaudhary,André F. T. Martins,Graham Neubig
机构：Andr´e F. T. Martins, Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, Instituto de Telecomunicac¸˜oes, Lisbon, Portugal, Unbabel, Lisbon, Portugal
备注：Accepted to ACL2021
链接：https://arxiv.org/abs/2105.06977

摘要：上下文感知机器翻译模型旨在利用上下文信息，但往往不能做到这一点。结果，他们错误地消除了代词和多义词的歧义，这些词需要上下文来解决。在本文中，我们提出了几个问题：人类译者使用什么样的语境来解决歧义词？模型是否大量关注同一背景？如果我们明确地训练他们这样做呢？为了回答这些问题，我们引入了SCAT（Supporting Context for difficious Translations），这是一个新的英法数据集，包含14K翻译的支持上下文词，专业翻译人员发现它对代词消歧很有用。使用SCAT，我们对用于消除歧义的上下文进行了深入分析，检查了支持词的位置和词汇特征。此外，我们还测量了模型的注意分数与来自SCAT的支持上下文之间的一致程度，并应用引导注意策略来鼓励两者之间的一致性。
摘要：Context-aware machine translation models are designed to leverage contextual information, but often fail to do so. As a result, they inaccurately disambiguate pronouns and polysemous words that require context for resolution. In this paper, we ask several questions: What contexts do human translators use to resolve ambiguous words? Are models paying large amounts of attention to the same context? What if we explicitly train them to do so? To answer these questions, we introduce SCAT (Supporting Context for Ambiguous Translations), a new English-French dataset comprising supporting context words for 14K translations that professional translators found useful for pronoun disambiguation. Using SCAT, we perform an in-depth analysis of the context used to disambiguate, examining positional and lexical characteristics of the supporting words. Furthermore, we measure the degree of alignment between the model's attention scores and the supporting context from SCAT, and apply a guided attention strategy to encourage agreement between the two.

【2】 Dynamic Multi-Branch Layers for On-Device Neural Machine Translation
标题：在设备神经机器翻译中的动态多分支层

作者：Zhixing Tan,Maosong Sun,Yang Liu
机构：Department of Computer Science and Technology, Tsinghua University, Institute for AI Industry Research, Tsinghua University, Institute for Artificial Intelligence, Tsinghua University, Beijing National Research Center for Information Science and Technology
链接：https://arxiv.org/abs/2105.06679

摘要：我们介绍了dalaj1.0，一个用于瑞典语可接受性判断的数据集，第一个版本包含9596个句子；并将其用于二值分类任务的初步实验。DaLAJ基于第二语言学习者的数据，包括不同水平的文章。为了确保数据集可以免费使用，尽管GDPR的规定，我们有句子混乱的学习者论文和删除部分元数据的学习者，为每个句子只保留有关母语的信息和课程水平的文章已经写了。我们使用学习者语言的规范化版本作为DaLAJ句子的基础，并且每个句子只保留一个错误。我们对句子中使用的每个单独的更正标记重复相同的句子。对于dalaj1.0，我们使用了四种错误类别（SweLL中有35种），它们都与词汇或构词选择有关。我们的二进制分类的基线结果显示，使用BERT嵌入的dalaj1.0的准确率为58%。数据集包含在SwedishGlue（Swe）中。SuperLim）基准。下面，我们将介绍数据集的格式、首次实验、我们的见解以及选择数据共享方法的动机。
摘要：With the rapid development of artificial intelligence (AI), there is a trend in moving AI applications such as neural machine translation (NMT) from cloud to mobile devices such as smartphones. Constrained by limited hardware resources and battery, the performance of on-device NMT systems is far from satisfactory. Inspired by conditional computation, we propose to improve the performance of on-device NMT systems with dynamic multi-branch layers. Specifically, we design a layer-wise dynamic multi-branch network with only one branch activated during training and inference. As not all branches are activated during training, we propose shared-private reparameterization to ensure sufficient training for each branch. At almost the same computational cost, our method achieves improvements of up to 1.7 BLEU points on the WMT14 English-German translation task and 1.8 BLEU points on the WMT20 Chinese-English translation task over the Transformer model, respectively. Compared with a strong baseline that also uses multiple branches, the proposed method is up to 1.6 times faster with the same number of parameters.

摘要|信息提取(2篇)

【1】 EASE: Extractive-Abstractive Summarization with Explanations
标题：轻松：带解释的摘要摘要

作者：Haoran Li,Arash Einolghozati,Srinivasan Iyer,Bhargavi Paranjape,Yashar Mehdad,Sonal Gupta,Marjan Ghazvininejad
机构：Facebook
链接：https://arxiv.org/abs/2105.06982

摘要：当前的摘要系统在性能上优于抽取式摘要系统，但由于其固有的可解释性不足，限制了其广泛应用。为了达到两全其美的效果，我们提出了一个基于证据的文本生成的抽象框架EASE，并将其应用到文档摘要中。我们提出了一个基于信息瓶颈原理的可解释摘要系统，该系统以端到端的方式联合训练用于抽取和抽象。受先前研究的启发，人类使用两阶段框架来总结长文档（Jing和McKeown，2000），我们的框架首先提取预定义数量的证据跨度作为解释，然后仅使用证据生成摘要。使用自动和人工评估，我们表明，我们的框架中的解释比简单的基线更相关，而不会实质性地牺牲生成摘要的质量。
摘要：Current abstractive summarization systems outperform their extractive counterparts, but their widespread adoption is inhibited by the inherent lack of interpretability. To achieve the best of both worlds, we propose EASE, an extractive-abstractive framework for evidence-based text generation and apply it to document summarization. We present an explainable summarization system based on the Information Bottleneck principle that is jointly trained for extraction and abstraction in an end-to-end fashion. Inspired by previous research that humans use a two-stage framework to summarize long documents (Jing and McKeown, 2000), our framework first extracts a pre-defined amount of evidence spans as explanations and then generates a summary using only the evidence. Using automatic and human evaluations, we show that explanations from our framework are more relevant than simple baselines, without substantially sacrificing the quality of the generated summary.

【2】 DialSumm: A Real-Life Scenario Dialogue Summarization Dataset
标题：DialSumm：一个真实场景对话摘要数据集

作者：Yulong Chen,Yang Liu,Liang Chen,Yue Zhang
机构：♠ Zhejiang University, ♥ School of Engineering, Westlake University, ♣ Microsoft Cognitive Services Research, ♦ College of Software, Jilin University, ♦ Institute of Advanced Technology, Westlake Institute for Advanced Study
备注：ACL findings
链接：https://arxiv.org/abs/2105.06762

摘要：大规模数据集的提出促进了新闻摘要深层神经模型的研究。深度学习还可能对口语对话摘要有用，这有助于一系列实际场景，包括客户服务管理和药物跟踪。为此，我们提出了DialSumm，一个大规模的有标签的对话摘要数据集。我们使用最先进的神经摘要器对DialSumm进行了实证分析。实验结果表明，对话摘要在口语术语、特殊的语篇结构、共指和省略、语用学和社会常识等方面面临着独特的挑战，这些都需要特定的表征学习技术来更好地应对。
摘要：Proposal of large-scale datasets has facilitated research on deep neural models for news summarization. Deep learning can also be potentially useful for spoken dialogue summarization, which can benefit a range of real-life scenarios including customer service management and medication tracking. To this end, we propose DialSumm, a large-scale labeled dialogue summarization dataset. We conduct empirical analysis on DialSumm using state-of-the-art neural summarizers. Experimental results show unique challenges in dialogue summarization, such as spoken terms, special discourse structures, coreferences and ellipsis, pragmatics and social commonsense, which require specific representation learning technologies to better deal with.

推理|分析|理解|解释(2篇)

【1】 Towards Navigation by Reasoning over Spatial Configurations
标题：通过空间构型推理实现导航

作者：Yue Zhang,Quan Guo,Parisa Kordjamshidi
机构：Michigan State University
链接：https://arxiv.org/abs/2105.06839

摘要：我们处理了一个导航问题，其中agent在观察环境的同时遵循自然语言的指令。以语言理解为重点，我们展示了空间语义在将导航指令根植于视觉感知中的重要性。我们提出了一种利用空间结构元素的神经代理，并研究了它们对导航代理推理能力的影响。此外，我们还建立了顺序执行顺序的模型，并将可视对象与指令中的空间配置对齐。我们的神经代理在可见的环境中改进了强基线，并在不可见的环境中显示出竞争性能。此外，实验结果表明，对指令中的空间语义元素进行显式建模可以提高模型的基础性和空间推理能力。
摘要：We deal with the navigation problem where the agent follows natural language instructions while observing the environment. Focusing on language understanding, we show the importance of spatial semantics in grounding navigation instructions into visual perceptions. We propose a neural agent that uses the elements of spatial configurations and investigate their influence on the navigation agent's reasoning ability. Moreover, we model the sequential execution order and align visual objects with spatial configurations in the instruction. Our neural agent improves strong baselines on the seen environments and shows competitive performance on the unseen environments. Additionally, the experimental results demonstrate that explicit modeling of spatial semantic elements in the instructions can improve the grounding and spatial reasoning of the model.

【2】 A cost-benefit analysis of cross-lingual transfer methods
标题：跨语言迁移方式的成本效益分析

作者：Guilherme Moraes Rosa,Luiz Henrique Bonifacio,Leandro Rodrigues de Souza,Roberto Lotufo,Rodrigo Nogueira
机构：University of Campinas (UNICAMP), NeuralMind Inteligência Artificial, David R. Cheriton School of Computer Science, University of Waterloo
链接：https://arxiv.org/abs/2105.06813

摘要：一种有效的跨语言迁移方法是在一种语言的有监督数据集上对双语或多语模型进行微调，并在另一种语言上以零镜头方式进行评估。在训练时或推理时翻译实例也是可行的选择。然而，与这些方法相关的成本在文献中很少提及。在这项工作中，我们分析了跨语言方法的有效性（如准确性）、开发和部署成本，以及它们在推理时的延迟。我们在三个任务上的实验表明，最好的跨语言方法是高度依赖于任务的。最后，通过结合零镜头和翻译方法，我们实现了本工作中使用的三个数据集中的两个数据集的最新技术。基于这些结果，我们质疑是否需要在目标语言中手动标记训练数据。代码、模型和翻译数据集可在https://github.com/unicamp-dl/cross-lingual-analysis
摘要：An effective method for cross-lingual transfer is to fine-tune a bilingual or multilingual model on a supervised dataset in one language and evaluating it on another language in a zero-shot manner. Translating examples at training time or inference time are also viable alternatives. However, there are costs associated with these methods that are rarely addressed in the literature. In this work, we analyze cross-lingual methods in terms of their effectiveness (e.g., accuracy), development and deployment costs, as well as their latencies at inference time. Our experiments on three tasks indicate that the best cross-lingual method is highly task-dependent. Finally, by combining zero-shot and translation methods, we achieve the state-of-the-art in two of the three datasets used in this work. Based on these results, we question the need for manually labeled training data in a target language. Code, models and translated datasets are available at https://github.com/unicamp-dl/cross-lingual-analysis

GAN|对抗|攻击|生成相关(3篇)

【1】 Generating Empathetic Responses with a Large Scale Dialog Dataset
标题：使用大规模对话数据集生成感同身受的响应

作者：Yubo Xie,Pearl Pu
机构：School of Computer and Communication Sciences, ´Ecole Polytechnique F´ed´erale de Lausanne, Switzerland
链接：https://arxiv.org/abs/2105.06829

摘要：移情反应生成的任务旨在生成语法正确的反应，更重要的是，在前面的对话之后生成情感上合适的反应。现有的模型要么直接引入预定义的情感信息来指导反应的产生，要么使用确定性规则来决定反应的情感，忽略了人类对话中捕捉到的微妙的情感交互。随着高级语言模型的出现，学习自然语言对话中捕捉到的微妙的情感交流成为可能。为了充分探索情感和对话意图的范围，重要的是要整理一个足够大的数据集，以阐明在我们的对话中人类情感互动的一般理解。在这篇文章中，我们详细描述了一个大规模对话数据集的整理过程，其中每个话语被标记为32种情感和9种意图类别中的一种。然后，我们将展示如何建立一个多回合共情对话模型，该模型与6000多个人类评估实例的基线相比表现良好。
摘要：The task of empathetic response generation aims at generating syntactically correct and, more importantly, emotionally appropriate responses following previous dialog turns. Existing models either directly incorporate pre-defined emotion information to guide the response generation, or use deterministic rules to decide the response emotion, ignoring the subtle emotion interactions captured in human conversations. With the advent of advanced language models, it is possible to learn the nuanced emotional exchanges captured in natural language dialogs. To fully explore the range of emotions and dialog intents, it is important to curate a dataset large enough to shed light on the general understanding of human emotional interactions in our conversations. In this paper, we describe in detail the curation process of a large-scale dialog dataset where each utterance is labeled with one of 32 emotions and 9 intent categories. We then show how to build a multi-turn empathetic dialog model that performs well compared to its baselines over 6,000 human evaluated instances.

【2】 Adversarial Learning for Zero-Shot Stance Detection on Social Media
标题：社交媒体上零射姿态检测的对抗性学习

作者：Emily Allaway,Malavika Srikanth,Kathleen McKeown
机构：Department of Computer Science, Columbia University, New York, NY
备注：To appear in NAACL 2021
链接：https://arxiv.org/abs/2105.06603

摘要：社交媒体上的立场检测有助于识别和理解日常生活中的倾斜新闻或评论。在这项工作中，我们提出了一个新的模型零射击姿态检测在Twitter上，使用对抗性学习，以推广跨主题。我们的模型在一些看不见的测试主题上以最小的计算成本实现了最先进的性能。此外，我们将零镜头姿态检测扩展到新的主题，突出了零镜头转移的未来方向。
摘要：Stance detection on social media can help to identify and understand slanted news or commentary in everyday life. In this work, we propose a new model for zero-shot stance detection on Twitter that uses adversarial learning to generalize across topics. Our model achieves state-of-the-art performance on a number of unseen test topics with minimal computational costs. In addition, we extend zero-shot stance detection to new topics, highlighting future directions for zero-shot transfer.

【3】 Joint Retrieval and Generation Training for Grounded Text Generation
标题：用于基础文本生成的联合检索和生成训练

作者：Yizhe Zhang,Siqi Sun,Xiang Gao,Yuwei Fang,Chris Brockett,Michel Galley,Jianfeng Gao,Bill Dolan
机构：Microsoft Corporation, Redmond, WA, USA
链接：https://arxiv.org/abs/2105.06597

摘要：近年来，GPT-3等大规模预训练技术的发展使得从给定的提示中生成看似高质量的文本成为可能。然而，这样的生成系统经常会遇到幻觉事实的问题，并且在设计上并不包含有用的外部信息。扎根生成模型似乎提供了补救措施，但它们的训练通常依赖于很少可用的并行数据，其中为上下文提供了相应的文档。我们提出了一个框架，通过在语言模型信号上联合训练接地生成器和文档检索器来减轻这种数据约束。该模型学习检索生成中效用最高的文档，并在输出中仔细地组合它们。我们证明，通过利用外部参照，我们的方法可以在散文和对话生成中产生更多信息和有趣的文本。
摘要：Recent advances in large-scale pre-training such as GPT-3 allow seemingly high quality text to be generated from a given prompt. However, such generation systems often suffer from problems of hallucinated facts, and are not inherently designed to incorporate useful external information. Grounded generation models appear to offer remedies, but their training typically relies on rarely-available parallel data where corresponding documents are provided for context. We propose a framework that alleviates this data constraint by jointly training a grounded generator and document retriever on the language model signal. The model learns to retrieve the documents with the highest utility in generation and attentively combines them in the output. We demonstrate that by taking advantage of external references our approach can produce more informative and interesting text in both prose and dialogue generation.

半/弱/无监督|不确定性(1篇)

【1】 Shades of confusion: Lexical uncertainty modulates ad hoc coordination in an interactive communication task
标题：念力的阴影：词汇不确定性影响互动交流任务中的即席协调

作者：Sonia K. Murthy,Robert D. Hawkins,Thomas L. Griffiths
机构：Department of Psychology, Princeton University, Princeton, NJ, Allen Institute for Artificial Intelligence, Seattle, WA, Department of Computer Science, Princeton University, Princeton, NJ, Author Note
备注：under review
链接：https://arxiv.org/abs/2105.06546

摘要：沟通伙伴在互动中所带来的期望存在很大的差异，从而产生误解的可能性。为了直接探索这些差距和我们克服它们的能力，我们提出了一个基于颜色概念关联的交流任务。在实验1中，我们根据最新的概率理论建立了这些期望的心理表征的几个关键属性，即词汇先验。对于抽象概念来说，关联是更可变的，可变性表现为每个个体内部的不确定性，不确定性能够准确预测其他人是否可能共享相同的关联。在实验2中，我们研究了这些表达对交流的下游影响。最初，当交流具有更多可变关联的概念时，准确率较低，但随着参与者形成特别约定，准确率迅速提高。总之，我们的研究结果表明，人们应对变化的方式是保持对伴侣的良好校准的不确定性和对自己的适当适应性表征。
摘要：There is substantial variability in the expectations that communication partners bring into interactions, creating the potential for misunderstandings. To directly probe these gaps and our ability to overcome them, we propose a communication task based on color-concept associations. In Experiment 1, we establish several key properties of the mental representations of these expectations, or \emph{lexical priors}, based on recent probabilistic theories. Associations are more variable for abstract concepts, variability is represented as uncertainty within each individual, and uncertainty enables accurate predictions about whether others are likely to share the same association. In Experiment 2, we then examine the downstream consequences of these representations for communication. Accuracy is initially low when communicating about concepts with more variable associations, but rapidly increases as participants form ad hoc conventions. Together, our findings suggest that people cope with variability by maintaining well-calibrated uncertainty about their partner and appropriately adaptable representations of their own.

识别/分类(2篇)

【1】 Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition
标题：定位和标注：嵌套命名实体识别的两阶段标识符

作者：Yongliang Shen,Xinyin Ma,Zeqi Tan,Shuai Zhang,Wen Wang,Weiming Lu
机构：College of Computer Science and Technology, Zhejiang University, University of Science and Technology of China
备注：Accepted to ACL 2021, submission version
链接：https://arxiv.org/abs/2105.06804

摘要：命名实体识别（Named entity recognition，NER）是自然语言处理中的一个研究热点。传统的NER研究只涉及平面实体，忽略了嵌套实体。基于广域的方法将实体识别视为广域分类任务。这些方法虽然具有处理嵌套NER的能力，但计算量大，对边界信息的忽略，对部分匹配实体的跨度利用不足，长实体识别困难。为了解决这些问题，我们提出了一种两阶段实体标识符。首先通过对种子跨度进行过滤和边界回归来生成跨度建议以定位实体，然后用相应的类别标记边界调整后的跨度建议。该方法有效地利用了训练过程中实体和部分匹配跨度的边界信息。通过边界回归，理论上可以覆盖任意长度的实体，提高了对长实体的识别能力。此外，在第一阶段中过滤掉许多低质量的种子跨度，降低了推理的时间复杂度。在嵌套的NER数据集上的实验表明，本文提出的方法优于现有的模型。
摘要：Named entity recognition (NER) is a well-studied task in natural language processing. Traditional NER research only deals with flat entities and ignores nested entities. The span-based methods treat entity recognition as a span classification task. Although these methods have the innate ability to handle nested NER, they suffer from high computational cost, ignorance of boundary information, under-utilization of the spans that partially match with entities, and difficulties in long entity recognition. To tackle these issues, we propose a two-stage entity identifier. First we generate span proposals by filtering and boundary regression on the seed spans to locate the entities, and then label the boundary-adjusted span proposals with the corresponding categories. Our method effectively utilizes the boundary information of entities and partially matched spans during training. Through boundary regression, entities of any length can be covered theoretically, which improves the ability to recognize long entities. In addition, many low-quality seed spans are filtered out in the first stage, which reduces the time complexity of inference. Experiments on nested NER datasets demonstrate that our proposed method outperforms previous state-of-the-art models.

【2】 Out-of-Manifold Regularization in Contextual Embedding Space for Text Classification
标题：上下文嵌入空间中的流形外正则化文本分类

作者：Seonghyeon Lee,Dongha Lee,Hwanjo Yu
机构：Dept. of Computer Science and Engineering, POSTECH, Republic of Korea, Institute of Artificial Intelligence, POSTECH, Republic of Korea
备注：ACL2021 main conference
链接：https://arxiv.org/abs/2105.06750

摘要：最近关于预训练权值（即BERT）神经网络的研究主要集中在一个低维子空间上，即从输入词（或其上下文）计算出的嵌入向量所在的子空间。在这项工作中，我们提出了一种新的方法来寻找和规范剩余的空间，称为外流形，这是无法通过文字访问。具体地说，我们基于从实际观察到的单词中获得的两个嵌入来合成流形外嵌入，以利用它们来微调网络。训练鉴别器来检测输入嵌入是否位于流形内部，同时优化生成器以产生新的嵌入，该鉴别器可以很容易地将其识别为流形外部的嵌入。这两个模块成功地以统一的端到端的方式协作来规范流形外的行为。我们对各种文本分类基准的广泛评估表明了我们的方法的有效性，以及它与旨在增强流形的现有数据增强技术的良好兼容性。
摘要：Recent studies on neural networks with pre-trained weights (i.e., BERT) have mainly focused on a low-dimensional subspace, where the embedding vectors computed from input words (or their contexts) are located. In this work, we propose a new approach to finding and regularizing the remainder of the space, referred to as out-of-manifold, which cannot be accessed through the words. Specifically, we synthesize the out-of-manifold embeddings based on two embeddings obtained from actually-observed words, to utilize them for fine-tuning the network. A discriminator is trained to detect whether an input embedding is located inside the manifold or not, and simultaneously, a generator is optimized to produce new embeddings that can be easily identified as out-of-manifold by the discriminator. These two modules successfully collaborate in a unified and end-to-end manner for regularizing the out-of-manifold. Our extensive evaluation on various text classification benchmarks demonstrates the effectiveness of our approach, as well as its good compatibility with existing data augmentation techniques which aim to enhance the manifold.

表征(1篇)

【1】 Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction
标题：反事实干预揭示关系从句表征对协议预测的因果效应

作者：Shauli Ravfogel,Grusha Prasad,Tal Linzen,Yoav Goldberg
机构：Computer Science Department, Bar Ilan University, Allen Institute for Artificial Intelligence, Cognitive Science Department, Johns Hopkins University, Department of Linguistics and Center for Data Science, New York University
备注：Equal contribution by SR and GP
链接：https://arxiv.org/abs/2105.06965

摘要：当语言模型处理句法复杂的句子时，它们是以一种与英语语法一致的方式使用这些句子中的抽象句法信息，还是仅仅依赖于一组启发式方法？我们提出了一个解决这个问题的方法，alterep。对于句子中的任何语言特征，AlterRep允许我们通过改变该特征的编码方式来生成反事实表示，同时保留原始表示的所有其他方面。然后，通过测量不同句子中这些反事实表征对模型单词预测的影响，我们可以得出关于模型使用语言特征的语境（如果有的话）的因果结论。应用该方法研究BERT如何利用关系从句（RC）跨度信息，发现BERT在使用语言策略进行一致性预测时使用了RC跨度信息。我们还发现，为特定RC子类型生成的反事实表示会影响具有其他RC子类型的句子中的数量预测，这表明关于RC边界的信息是抽象编码在BERT表示中的。
摘要：When language models process syntactically complex sentences, do they use abstract syntactic information present in these sentences in a manner that is consistent with the grammar of English, or do they rely solely on a set of heuristics? We propose a method to tackle this question, AlterRep. For any linguistic feature in the sentence, AlterRep allows us to generate counterfactual representations by altering how this feature is encoded, while leaving all other aspects of the original representation intact. Then, by measuring the change in a models' word prediction with these counterfactual representations in different sentences, we can draw causal conclusions about the contexts in which the model uses the linguistic feature (if any). Applying this method to study how BERT uses relative clause (RC) span information, we found that BERT uses information about RC spans during agreement prediction using the linguistically strategy. We also found that counterfactual representations generated for a specific RC subtype influenced the number prediction in sentences with other RC subtypes, suggesting that information about RC boundaries was encoded abstractly in BERT's representation.

其他神经网络|深度学习|模型|建模(1篇)

【1】 Thank you BART! Rewarding Pre-Trained Models Improves Formality Style Transfer
标题：谢谢你，巴特！奖励预先培训的模特可以改善礼仪风格的转移

作者：Huiyuan Lai,Antonio Toral,Malvina Nissim
机构：CLCG, University of Groningen The Netherlands
链接：https://arxiv.org/abs/2105.06947

摘要：并行数据的缺乏导致形式化的传输模型在保存内容方面很少成功。我们表明，微调预训练语言（GPT-2）和序列到序列（BART）模型可以增强内容保存，而且即使在有限的并行数据量下，这也是可能的。通过以风格和内容（任务的两个核心方面）为目标的奖励来增强这些模型，我们实现了一种新的技术水平。
摘要：Scarcity of parallel data causes formality style transfer models to have scarce success in preserving content. We show that fine-tuning pre-trained language (GPT-2) and sequence-to-sequence (BART) models boosts content preservation, and that this is possible even with limited amounts of parallel data. Augmenting these models with rewards that target style and content --the two core aspects of the task-- we achieve a new state-of-the-art.

其他(4篇)

【1】 Plot and Rework: Modeling Storylines for Visual Storytelling
标题：情节和返工：为视觉讲故事建模故事情节

作者：Chi-Yang Hsu,Yun-Wei Chu,Ting-Hao,Huang,Lun-Wei Ku
机构：Pennsylvania State University , Purdue University , Institute of Information Science, Academia Sinica
备注：Accepted by ACL'21 Findings; this is not the camera-ready version
链接：https://arxiv.org/abs/2105.06950

摘要：写一个连贯而引人入胜的故事并不容易。有创造力的作家利用他们的知识和世界观，把不连贯的元素组合在一起，形成一个连贯的故事情节，并不断地工作和修改，力求完美。然而，自动视觉故事讲述（VIST）模型在尝试创建故事时，很少使用外部知识和迭代生成。本文介绍了PR-VIST，一种将输入图像序列表示为故事图的框架，在该框架中找到形成故事线的最佳路径。然后PR-VIST走这条路，通过迭代训练过程学习生成最终故事。该框架产生的故事在多样性、连贯性和人性化方面都优于自动和人工评估。烧蚀研究表明，绘图和修改都有助于提高模型的优越性。
摘要：Writing a coherent and engaging story is not easy. Creative writers use their knowledge and worldview to put disjointed elements together to form a coherent storyline, and work and rework iteratively toward perfection. Automated visual storytelling (VIST) models, however, make poor use of external knowledge and iterative generation when attempting to create stories. This paper introduces PR-VIST, a framework that represents the input image sequence as a story graph in which it finds the best path to form a storyline. PR-VIST then takes this path and learns to generate the final story via an iterative training process. This framework produces stories that are superior in terms of diversity, coherence, and humanness, per both automatic and human evaluations. An ablation study shows that both plotting and reworking contribute to the model's superiority.

【2】 Neural-Symbolic Commonsense Reasoner with Relation Predictors
标题：带关系预测的神经-符号常识推理机

作者：Farhad Moghimifar,Lizhen Qu,Yue Zhuo,Gholamreza Haffari,Mahsa Baktashmotlagh
机构：The School of ITEE, The University of Queensland, Australia, Monash University, Australia, School of CSE, The University of New South Wales, Australia
备注：ACL2021
链接：https://arxiv.org/abs/2105.06717

摘要：常识推理的目的是将从常识知识图（CKG）中提取的一组常识事实结合起来，得出关于一般情况的结论。常识知识的动态特性假设模型能够在新情况下进行多跳推理。这一特性还导致具有大规模稀疏知识图，在这种情况下，需要这样的推理过程来预测新事件之间的关系。然而，这一领域的现有方法由于将CKG视为一组有限的事实而受到限制，从而使它们不适合对新的看不见的情况和事件进行推理。本文提出了一种神经符号推理机，它能够对大规模动态CKG进行推理。该模型在训练过程中学习了CKGs推理的逻辑规则。除了提供可解释的解释外，学习的逻辑规则有助于将预测推广到新引入的事件。在CKGs链路预测任务上的实验结果证明了该模型的有效性。
摘要：Commonsense reasoning aims to incorporate sets of commonsense facts, retrieved from Commonsense Knowledge Graphs (CKG), to draw conclusion about ordinary situations. The dynamic nature of commonsense knowledge postulates models capable of performing multi-hop reasoning over new situations. This feature also results in having large-scale sparse Knowledge Graphs, where such reasoning process is needed to predict relations between new events. However, existing approaches in this area are limited by considering CKGs as a limited set of facts, thus rendering them unfit for reasoning over new unseen situations and events. In this paper, we present a neural-symbolic reasoner, which is capable of reasoning over large-scale dynamic CKGs. The logic rules for reasoning over CKGs are learned during training by our model. In addition to providing interpretable explanation, the learned logic rules help to generalise prediction to newly introduced events. Experimental results on the task of link prediction on CKGs prove the effectiveness of our model by outperforming the state-of-the-art models.

【3】 DaLAJ - a dataset for linguistic acceptability judgments for Swedish: Format, baseline, sharing
标题：DALAJ-瑞典语语言可接受性判断的数据集：格式、基线、共享

作者：Elena Volodina,Yousuf Ali Mohammed,Julia Klezl
机构：University of Gothenburg, Sweden
备注：This is an extended version of an article accepted to the 10th NLP4CALL workshop (2021), Link\"oping Electronic Conference Proceedings 177, ISSN: 1650-3740 (online). In the extended version (available at arXiv) we have added a description of an experiment and baseline results to the dataset description accepted for NLP4CALL publication
链接：https://arxiv.org/abs/2105.06681

摘要：我们介绍了dalaj1.0，一个用于瑞典语可接受性判断的数据集，第一个版本包含9596个句子；并将其用于二值分类任务的初步实验。DaLAJ基于第二语言学习者的数据，包括不同水平的文章。为了确保数据集可以免费使用，尽管GDPR的规定，我们有句子混乱的学习者论文和删除部分元数据的学习者，为每个句子只保留有关母语的信息和课程水平的文章已经写了。我们使用学习者语言的规范化版本作为DaLAJ句子的基础，并且每个句子只保留一个错误。我们对句子中使用的每个单独的更正标记重复相同的句子。对于dalaj1.0，我们使用了四种错误类别（SweLL中有35种），它们都与词汇或构词选择有关。我们的二进制分类的基线结果显示，使用BERT嵌入的dalaj1.0的准确率为58%。数据集包含在SwedishGlue（Swe）中。SuperLim）基准。下面，我们将介绍数据集的格式、首次实验、我们的见解以及选择数据共享方法的动机。
摘要：We present DaLAJ 1.0, a Dataset for Linguistic Acceptability Judgments for Swedish, comprising 9 596 sentences in its first version; and the initial experiment using it for the binary classification task. DaLAJ is based on the SweLL second language learner data, consisting of essays at different levels of proficiency. To make sure the dataset can be freely available despite the GDPR regulations, we have sentence-scrambled learner essays and removed part of the metadata about learners, keeping for each sentence only information about the mother tongue and the level of the course where the essay has been written. We use the normalized version of learner language as the basis for the DaLAJ sentences, and keep only one error per sentence. We repeat the same sentence for each individual correction tag used in the sentence. For DaLAJ 1.0 we have used four error categories (out of 35 available in SweLL), all connected to lexical or word-building choices. Our baseline results for the binary classification show an accuracy of 58% for DaLAJ 1.0 using BERT embeddings. The dataset is included in the SwedishGlue (Swe. SuperLim) benchmark. Below, we describe the format of the dataset, first experiments, our insights and the motivation for the chosen approach to data sharing.

【4】 NLP is Not enough -- Contextualization of User Input in Chatbots
标题：仅有NLP是不够的--聊天机器人中用户输入的语境化

作者：Nathan Dolbir,Triyasha Dastidar,Kaushik Roy
机构：Artificial Intelligence Institute, University of South Carolina, BITS-Pilani Hyderabad
链接：https://arxiv.org/abs/2105.06511

摘要：近年来，AI聊天机器人在技术改进方面取得了巨大进步，已经在许多行业投入使用。基于深度网络的高级自然语言处理技术可以有效地处理用户的请求，以实现其功能。随着聊天机器人越来越受欢迎，由于负担过重的系统降低了经济和人力成本，它们在医疗保健领域的适用性是一个很有吸引力的命题。然而，医疗机器人需要安全且医学上精确的信息捕获，而由于用户文本和语音的变化，深度网络还不能捕获这些信息。符号结构中的知识更适合于精确推理，但不能直接处理自然语言处理。因此，在本文中，我们研究了结合知识和神经表示对聊天机器人安全性、准确性和理解的影响。
摘要：AI chatbots have made vast strides in technology improvement in recent years and are already operational in many industries. Advanced Natural Language Processing techniques, based on deep networks, efficiently process user requests to carry out their functions. As chatbots gain traction, their applicability in healthcare is an attractive proposition due to the reduced economic and people costs of an overburdened system. However, healthcare bots require safe and medically accurate information capture, which deep networks aren't yet capable of due to user text and speech variations. Knowledge in symbolic structures is more suited for accurate reasoning but cannot handle natural language processing directly. Thus, in this paper, we study the effects of combining knowledge and neural representations on chatbot safety, accuracy, and understanding.

上一篇：陆俭明：不忘朱先生对我的指导和帮助下一篇：语言认知科学国际学术研讨会（CLCS-1）通知（第2号）

最热资讯

热门标签