The Operation and Training of Large Language Models and a Strategic Analysis of Their Application Part I.

doi: 10.32561/nsz.2025.1.6

Abstract

Nowadays, an increasing number of work environment software solutions are emerging that are supported by artificial intelligence agents, where the main capabilities are provided by large language models. The observed development directions of these models (e.g., reducing resource consumption, compact architectures) suggest a future in which this technology will likely become part of our everyday toolkit in information processing, including national security tasks. Effective and responsible use requires understanding the fundamental principles and development needs of this technology, as well as uncovering both its positive and negative features in real word application. In the first part of this two-part study aimed at knowledge dissemination, we focus on transformers and their most essential technological fundamentals in the broader fields of artificial intelligence and machine learning, in order to demystify this phenomenon – often regarded as obscure and difficult to understand – and bring it closer to societal acceptance.

Keywords:

LLM Large language model artificial intelligence AI Chatgpt machine learning

References

AGÜERA Y ARCAS, Blaise (2022): Do Large Language Models Understand Us. Daedalus, 151(2), 183–197. Online: https://doi.org/10.1162/daed_a_01909

Az Európai Parlament és a Tanács (EU) 2024/1689 rendelete (2024. június 13.) a mesterséges intelligenciára vonatkozó harmonizált szabályok megállapításáról, valamint a 300/2008/EK, a 167/2013/EU, a 168/2013/EU, az (EU) 2018/858, az (EU) 2018/1139 és az (EU) 2019/2144 rendelet, továbbá a 2014/90/EU, az (EU) 2016/797 és az (EU) 2020/1828 irányelv módosításáról (a mesterséges intelligenciáról szóló rendelet).

BOMMASANI, Rishi et al. (2021): On the Opportunities and Risks of Foundation Models. Center for Research on Foundation Models. Stanford Institute for Human-Centered Artificial Intelligence, Stanford University. Online: https://arxiv.org/pdf/2108.07258.pdf

BOTTYÁN Sándor (2024): Nagy nyelvi modellel támogatott nyílt forrású információgyűjtés a kibertérben. Tudományos diákköri dolgozat. Budapest: Nemzeti Közszolgálati Egyetem Államtudományi és Nemzetközi Tanulmányok Kar.

CHERNYAVSKIY, Anton – ILVOVSKY, Dmitry – NAKOV, Preslav (2021): Transformers: „The End of History” for Natural Language Processing?”. In Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17,Z021, Proceedings, Part III 21. Springer, 677–693. Online: https://doi.org/10.1007/978-3-030-86523-8_41

CHOWDHARY, K. R. (2020): Fundamentals of Artificial Intelligence. Springer India. 1st ed. Online: https://doi.org/10.1007/978-81-322-3972-7

DEVLIN, Jacob et al. (2018): BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Online: https://doi.org/10.48550/arXiv.1810.04805

EHAB, Michael (2023): The Secrets of Large Language Models Parameters: How They Affect the Quality, Diversity, and Creativity of Generated Texts [With Examples]. Medium, 2023. július 30. Online: https://michaelehab.medium.com/the-secrets-of-large-language-models-parameters-how-they-affect-the-quality-diversity-and-32eb8643e631

FERRER, Josep (2024): How Transformers Work: A Detailed Exploration of Transformer Architecture. Datacomp, 2024. január 9. Online: https://www.datacamp.com/tutorial/how-transformers-work

HUMOR, Michael (2023): Understanding „Tokens” and Tokenization in Large Language Models. Online: https://blog.devgenius.io/understanding-tokens-and-tokenization-in-large-language-models-1058cd24b944

MALIK, Farhad (2019): Neural Networks – A Solid Practical Guide. Explaining How Neural Networks Work With Practical Examples. Medium, 2019. május 16. Online: https://medium.com/fintechexplained/neural-networks-a-solid-practical-guide-9f343594b02a

MUNOZ, Andres (2012): Machine Learning and Optimization. Online: https://www.semanticscholar.org/paper/Machine-Learning-and-Optimization-Munoz/7fbba79630b5a09dd66ab13f00c3aefaa56cf268

ONGSULEE, Pariwat (2017): Artificial Intelligence, Machine Learning and Deep Learning. 15th International Conference on ICT and Knowledge Engineering (ICT&KE). Bangkok. Online: https://doi.org/10.1109/ICTKE.2017.8259629

RAFFEL, Colin et al. (2020): Exploring the Limits of Transfer Learning With a Unified Text-To-Text Transformer. The Journal of Machine Learning Research, 21(1), 1–67. Online: https://jmlr.org/papers/volume21/20-074/20-074.pdf

SAVIO, Jacob (2024): What Is Prompt Engineering? Definition, Elements, Techniques, Applications, and Benefits. Spiceworks, 2024. április 24. Online: https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-prompt-engineering

SHIN, Taylor et al. (2020): AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In WEBBER, Bonnie et al. (szerk): Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 4222–4235. Online: https://doi.org/10.18653/v1/2020.emnlp-main.346

VASWANI, Ashish et al. (2017): Attention Is All You Need. Online: https://doi.org/10.48550/arXiv.1706.03762

Downloads

Download data is not yet available.