A Global Past-Future Early Exit Method for Accelerating Inference of Pre-trained Language Models

Abstract

Early exit mechanism aims to accelerate inference speed for large-scale pre-trained language models. The essential idea is exiting early without passing through all the inference layers at the inference stage. To make accurate predictions for downstream tasks, the hierarchical linguistic information embedded in all layers should be jointly considered. However, much of the research up to now has been limited to use local representations of the exit layer. Such treatment inevitably loses information of the unused passed layers as well as the high-level features embedded in future layers, leading to sub-optimal performance. To address this issue, we propose a novel Past-Future method to make comprehensive predictions from a global perspective. We first take into consideration all the hierarchical linguistic information embedded in the past layers and then take a further step to engage the future states which are originally inaccessible for predictions. Extensive experiments demonstrate that our method outperforms previous early exit methods by a large margin, yielding more effective and more robust results.

Publication
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2021 (to appear)
Xuancheng Ren
Xuancheng Ren

My research interests include distributed robotics, mobile computing and programmable matter.