Cross-lingual Transfer with Word Order Differences

1 minute read

Published:

Cross-lingual transfer, which transfers models across languages, has tremendous practical value. It reduces the requirement of annotated data for a target language and is especially useful when the target language lacks resources. Transferring across languages is challenging as it requires understanding and handling differences between languages at levels of morphology, syntax, and semantics. One of the key challenges is the variation in word order among different languages. For example, the Verb-Object pattern in English can hardly be found in Japanese. This challenge should be taken into consideration in model design. We posit that order-free models have better transferability than order-sensitive models because they less suffer from overfitting language-specific word order features.

In the family of deep neural models, recurrent neural network (RNN) and Transformer are widely adopted to learn contextual word representations. RNNs have a sequential nature, which makes them reliant on word order. As a result, RNNs are exposed to the risk of encoding language-specific order information that cannot generalize across languages. We characterize this as the order-sensitive property. On the other hand, Transformer uses self-attention mechanisms to capture context that is insensitive to order information. With carefully designed position representations, the self-attention mechanism can be more robust than RNNs to the change of word order. We refer to this as the order-free property.

In our work published at NAACL 2019, we systematically investigate RNNs and Transformers for cross-lingual representation learning. We show that Transformers as order-free models generally perform better than the order-sensitive RNN models for cross-lingual transfer, especially when the source and target languages are distant (e.g., English and Japanese).