An Introduction to Large Langue Models(LLMs) — A Guide for Busy Newbies

Donovan Rittenbach - Copywriter, AI Jockey

5 min readDec 1, 2023

Large language models (LLMs) have recently emerged as a promising new computing paradigm. In this intro, we will cover what LLMs are, how they work, how they are trained, their capabilities and limitations, and the security considerations around them.

At a high level, LLMs are machine learning models that are trained on vast amounts of text data to predict the next word in a sequence. The key idea is that in order to accurately predict the next word, the model needs to build up a rich understanding of language structure, semantics, knowledge about the world, and more.

Anatomy of an LLM

LLMs are composed of two key components — the model architecture and the model parameters or weights. The architecture defines the basic structure and operations of the neural network model. The weights, numbering in the billions, are learned during training and encode the specific knowledge and capabilities of that model. For example, the Lamda-2–7B model has over 175 billion parameters.

To run an LLM you also need code to load the parameters and execute the model architecture. So at minimum, an LLM is the combination of:
1) Parameters file
2) Model execution code
3) The text input

With just these pieces, even complex models with billions of parameters like Lamda can run on a single laptop. The parameters and code contain everything needed to generate intelligent text for any given prompt.

LLM Training

While model inference can run on a laptop, model training requires significant computational resources. Pre-training a large LLM on a broad text corpus requires multi-petaflop scale compute clusters, costing millions of dollars to run for weeks at a time.

The goal of pre-training is to “compress” a large amount of text, on the order of tens of terabytes from the internet, into the fixed size parameters of the model. This builds up broad abilities of the LLM across different domains.

After pre-training comes fine-tuning, which adapts the LLM to specific tasks like question answering by training on curated datasets. While still computationally intensive, fine-tuning is much less resource demanding than pre-training.

Emergent LLM Capabilities

Despite training only to predict the next word, LLMs exhibit a range of capabilities useful for real world applications:

- Text generation — LLMs can generate coherent, human like text continuations from a prompt across different styles and domains. The generated text can appear eerily on point, with the model “dreaming up” plausible sounding content from its training data.

- Question answering — After fine-tuning, LLMs can rapidly answer natural language questions by summarizing information retrieved from documents or even just relying on their internal knowledge.

- Classification — LLMs can categorize texts, like detecting if a movie review is positive or negative. They are state of the art across many NLP classification benchmarks.

- Translation — LLMs can translate between languages with higher quality than previous phrase-based or encoder-decoder models.

- Summarization — LLMs are excellent at distilling longer texts into concise yet comprehensive summaries while preserving key information.

These impressive capabilities result from the models gaining broad linguistic, semantic, common sense, and domain specific knowledge during pre-training. However there are still many shortcomings around reasoning, generalization, and robustness that require further research to address.

The LLM Computing Stack
Rather than just view LLMs as another AI technology, it is useful to consider them as the kernel of a new computing stack. The large neural network, with words flowing through it, orchestrates access to different forms of memory, data stores, and software tools in service of completing tasks.

An analogy can be made to today’s operating systems, with the LLM serving as an intelligent interface and flow controller, analogous to the kernel space, mediating requests between applications software (tools) in user space and underlying memory and mass storage (the model’s knowledge stores).

From this perspective, scaling model size (parameters) corresponds to expanding RAM capacity. Increasing pre-training data scales disk capacity. And fine-tuning acts similar to installing application software. Features like retrieval and self-supervised learning then provide dynamic linking to knowledge sources.

Over time, integrating other modalities like images, video, and audio will further enrich applications built on top of LLMs, powered by their foundational language understanding abilities.

Security Considerations

As with any rapidly advancing technology seeing widespread adoption, the openness and broad capabilities of LLMs also open up the potential for misuse. As such, the following security considerations around LLMs are important to take into account:

- Parameter tampering — Since LLMs rely so heavily on their parameters, injecting malicious behaviors by modifying the weights remains an area of concern. This is still mainly theoretical but solutions typically focus on code/data signing and enhanced verification.

- Data poisoning — If malign data enters pre-training or fine-tuning datasets, it risks teaching problematic behaviors. As assembling training data relies extensively on human raters, improving guidelines, auditing, and red team testing helps safeguard data quality.

- Jailbreak prompts — Carefully crafted prompts can sometimes unlock capabilities LLMs were intended to fence off. Adversarial filtering approaches combat this but constant prompt engineering cycles remain likely.

- Prompt injection — By piggybacking text directing model behavior within application data inputs, LLMs risk being turned against systems that employ them. Strictly separating control and input channels provides some mitigation.

While the full scope of security holes and attack vectors likely remains undiscovered as LLMs diffuse more widely, responsible design and engineering discipline, securing both data flows and learned model state, offer perhaps the best proactive approaches for now. Monitoring for misuse, collecting problematic cases, and iterating on solutions will further help strengthen LLM safety and security long term.

Conclusion

In this introduction to large language models we have looked across a range of topics, from their composition to capabilities to evolution and safe deployment. As research and investment accelerates, LLMs seem poised to impact every corner of human language understanding and manipulation, bringing new convenience but also complexity. Understanding their strengths and weaknesses, and crafting appropriate applications, remains the chief challenge and opportunity as they continue opening new frontiers in AI.

Based on Andrej Karpathy’s Exceptional YouTube Lecture