15 | | 1. Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 |
16 | | 1. Polosukhin, Illia, et al. "Attention Is All You Need". arXiv:1706.03762 |
17 | | 1. Alammar, Jay. "The Illustrated Transformer". jalammar.github.io |
| 15 | 1. Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". [[https://arxiv.org/abs/1810.04805v2|arXiv:1810.04805v2]] |
| 16 | 1. Polosukhin, Illia, et al. "Attention Is All You Need". [[https://arxiv.org/abs/1810.04805v2|arXiv:1706.03762]] |
| 17 | 1. Alammar, Jay (2018). The Illustrated Transformer [Blog post]. Retrieved from https://jalammar.github.io/illustrated-transformer/ |
| 18 | 1. Alammar, Jay (2018). The Illustrated BERT, ELMo, and co. [Blog post]. Retrieved from https://jalammar.github.io/illustrated-bert/ |
| 19 | |
| 20 | |
| 21 | |