How the 🦊 does Generative AI WOrk?

It’s Not About Knowing. It’s About Guessing Smart.

Here’s the deal: GenAI doesn’t know stuff the way we do. It’s not reaching into a mental filing cabinet to pull out facts. What it is doing is making insanely good guesses based on patterns it’s seen before.

When you ask ChatGPT a question, it’s not grabbing a saved answer. It’s predicting what should come next — token by token — based on everything it’s learned. Basically, it’s next-level autocomplete. Like how your phone might follow “Hey!” with “How are you,” except GenAI is doing that, but on rocket fuel.

What is a token?

Think of tokens as the "vocabulary words" that AI models understand. Some tokens are full words, others are parts of words, and some are just single characters. When you chat with AI, everything you write gets broken down into these chunks!

Tokens are the basic units that language models process. A token can be a whole word (e.g., "cat"), a part of a word (e.g., "super" + "califragilistic"), a single character (especially for uncommon characters), or event a punctuation mark, space, or special symbol.

Most modern language models use around 50,000 to 100,000 tokens in their vocabulary. Imagine playing a word association game. If I say "peanut," you might think "butter." If I say "once upon a," you'll likely think "time." GenAI does something similar, but with mathematical precision across billions of patterns.

For every token the AI processes, it calculates probabilities for what might come next. It's like having a giant statistical lookup table of "what follows what" in language. When writing about rockets, words like "launch," "space," and "propulsion" become more probable. When writing a recipe, words like "bake," "tablespoon," and "mixture" rise to the top.

The magic (or rather, the math) happens when GenAI considers not just the last word, but potentially hundreds or thousands of tokens of previous context. This is why it can maintain themes, reference earlier points, and produce coherent text over long passages.

Is all that overwhelming?

GenAI Tokenization Demo

Enter some text below to see how it might be broken into tokens by a language model:

Or try these examples:

Let’s see it in action. Try our interactive tokenizer and see how an ai tool might tokenize your inputs.

Need some real world analogies?

  • Reading: Tokenization is similar to how children learn to read - first recognizing common letter combinations, then whole words, and finally understanding sentences.

  • Old Text Messages: Remember 160-character SMS limits? AI tokens work similarly - there's a limit to how much you can send at once.

  • Jigsaw Puzzle: The AI breaks your text into puzzle pieces (tokens) that it understands. Some pieces are bigger than others, depending on how common they are.

How GenAI Turns Words Into Meaning: Embeddings

Once GenAI breaks your text into tokens, it gives each one a kind of mathematical vibe check called an embedding. These are number-based representations that capture the meaning behind the words.

In this weirdly beautiful math-space, words that feel similar, like “happy,” “joyful,” and “delighted,” end up near each other. A word like “sad” floats off in another direction. It’s not doing old-school keyword matching; it’s navigating this wild, high-dimensional map of meaning to figure out how everything connects.

Large language models are typically pre-trained on vast datasets scraped from the internet, including books, articles, websites, and social media. This training process involves predicting masked tokens in sentences, helping the model learn language patterns, facts, and unfortunately, biases present in its training data.

This is why GenAI can sometimes produce incorrect information with great confidence. It's not accessing a curated knowledge database, it's predicting what text would statistically follow in a given context, based on what it's seen before. This is called a “hallucination.” GenAI is not designed to retrieve verified facts but to generate plausible-sounding text that maximizes a reward function.

What Does This Mean for You?

So, when GenAI answers your question, it’s not cheating off some internet database. It’s guessing—brilliantly, creatively, and sometimes a little too confidently. That’s why tools like this can feel like magic one minute and nonsense the next. But once you understand how it works under the hood, it stops being a black box and starts being a power tool. The more you play with it, the better you’ll get at using it.

Want to keep going? Try feeding the tokenizer your favorite poem, weirdest Slack message, or that email you rewrote three times. You might never look at language the same way again.

Don’t like reading? We got you!

Take a listen here ->

Want to chat about it?