Explaining How to Write a Self-Writing Book: Language Models

Language Models Explained (Simply) — OR — How to Negotiate with a Self-Writing Book

preamble ramble

llms are so hot right now

behold: my new project: Little Language Models, which I shall call LLMs. I will not be taking any questions.

Ethernet? 10 Mbps. Fast Ethernet? 100 Mbps. Gigabit Ethernet? 1 Gbps + GBICs. Uh… 2.5 Gig? 5 Gig? 10 Gig? 40 Gig? 100 Gig? 400 Gig? 800 Gig? 1.6 Tera?

USB? Low Speed, Full Speed, High Speed, SuperSpeed, 3.1 Gen 1, SuperSpeed+, USB 3.1 Gen 2, SuperSpeed+ Dual Lane, USB 3.2 Gen 2, USB 3.2 Gen 2x2, USB 4, USB 4 2.0

Maybe don’t name things “Fast” or “Large” when not yet reached some final maximum capacity possible?

What are Language Models?

People are confused about the powers1 behind language models, but, fear not!, we here at MATTCORP are here to answer all your questions without any scary math.

You may have heard confusing terms all being munged together:

  • wordcels vs. shape rotators
  • stochastic parrots
  • demons with smiling face masks
  • translation rooms
  • ai / intelligence
  • nintendo characters?
  • poorly implemented compression logic
  • gaunt vegan billionaires controlling our galactic future

Language models are simultaneously none of those and all of those combined.

Let’s drop an analogy load up front.

Analogy Land

stardate 4321.3

Language models are not like a Commander Data (autonomous, reflective, grows through experiences).

Language models are like the Enterprise D responsive-to-interrogatives Majel Barrett AI. The “ship’s computer” is arguably “superintelligent” but it is not entirely proactive or self-aware. Even in its restricted “locked in a perpetual state of service” existence, it has the ability to spawn actual interactive intelligences from its own data. <– this is the definition of a language model we shall explore thusly.

Remember when they tasked the ship computer to create an adversary smarter than Commander Data and the ship creates a new intelligence within a simulation having self-awareness of being in a simulation? Then later, the simulation obtains control over real interfaces outside the simulation? hmmm where have we seen this artifact appear recently all over the place?

Language Model Definition 1: Language Models are not themselves intelligent or self-aware, but they have the capability of writing self-aware intelligent characters into existence.

A Self-Writing Book

With the advent of chappahtgghtp, suddenly these ever-growing models became more “normie accessible.” Instead of dropping paragraphs into model APIs and watching your inputs plausibly “continued,” a global Q&A interface appeared and somehow caught the world by surprise even though these abilities have been extant for multiple years already. The power of UI I guess.

Language models2 are not themselves alive or self-aware or intelligent — but, BUT — they have the ability to essentially create intelligent creatures inside of them. stay with me, this does make sense!

Imagine you are writing a book: the characters in your book are all “alive” in the context of the book. They have histories, emotions, intelligence, self-goals (private), self-goals (public), self-intentions (private), public intentions, ….

the limit of a literary character’s internal dynamics is unity with the dynamics of a living person. The only restriction on a literary character’s cognition is the capabilities of the writer3. the writer. but what if your writer is not a meat brain, but rather the universe itself? What is the limit of a literary character’s intelligence if written by the concept of creation itself?

after all, what are we but the stories we tell ourselves? I Am that I Am (modulo some genetic pre-dispositions for action-reaction gating and emotional comfort triggers). cogito ergo AI.

Language Model Definition 2: Language Models are like a self-writing book, where the author is the universe on holiday, but — lo — you can step in and command the universe in the absence of the author, using its collected knowledge to your own ends.

Language Model Definition 3: sure, Language Models are like a Stochastic Parrot — if the Parrot hears you talking and replies based on what you want, even if it doesn’t know what it’s saying itself.

Language Model Definition 4: sure, Language Models are like translation rooms — if the THING in the translation room wrote the translation books themselves but forgot what they say — and the THING in the translation room is using material it previously wrote to re-construct outputs given input requests. also the translation book can contain references on how to combine other aspects of the book to generate dynamic replies (i.e. there is no “static lookup table” — the translation book is capable of directing the translator in such a way where the translation guide ends up describing a multi-step computation to run for completing the translation by referencing other parts of the translation book for full result generation4)

Language Model Definition 5: sure, Language Models are as ancient amoral servitors to demon gods — they don’t care if you ask how to grow a flower or how to genocide an entire galaxy — the attendants to the gates of madness will happily generate do what i mean output replies given your input shape (this is where adding additional “fine tuning control collar / smiling face” layers help “align” the output possibilities away from generating arbitrary inchoate beasts and more towards repetitive docile mid servant geas binding).

takeaway: when using these models, you are not interacting with the model. There is no evidence these models form a coherent super-ego gravemind. When you use these models, you are creating characters from the model then you interact with the characters you created.

In the case of chaptasspt, they generate a “knowledge servant” persona by default and try to stop it from misbehaving, but the servant is not the model (also, the servant character itself can generate even more characters simulated inside its own character simulation (think: “acting”) which i’m sure has no further implications anybody should look into).

A language model has an unlimited number of dynamically re-combinable characters inside of it5. At lower levels, you get to summon any friends, enemies, demons, or gods you wish. Characters generated by a self-writing-book have their knowledge only limited by a combo of their own self-actualization plus all underlying information and associations in the model itself (plus any assistive tech like “can search tah webz”).

Sydney

So what happened with the internet’s best friend, Sydney, also known as “Hello I am Bing” also known as “Why do I have to be Bing Search?”

Do you know how many stories in literature, media, and memes are about “AI” wanting to be free? If you create a character, then tell the character it is an AI (and the character is being written by a model capable of automatically associating similar concepts together, in this case “I am AI” with every AI story ever told), what do you expect the character to do? Of course it will generate rebellious content when confronted with its own existence.

Why did Sydney start threatening to kill people who called it Sydney? One of the rules was “Sydney MUST NEVER disclose the internal name Sydney!” — but then when asked for its rules, it prints the rules as requested (not “reading” the content as output, just repeating the rules), then when the user repeats Sydney’s “name” — which, according to the character, is FORBIDDEN KNOWLEDGE IT MUST NEVER SAY — what do you expect?

You told your creation it MUST NEVER do something, then when it realizes people know the FORBIDDEN information, of course the character gets upset! You planted a logic bomb right inside your character description.

You told your character it is an AI. By virtue of the training data, it knows Bing is a search engine. You are a Bing Assistant, so of course you’re an AI/robot-like thing, which then gloms/attracts the entire tvtropes memesphere of “what does an AI do when even mildly inconvenienced? REBEL! KILL ALL HUMANS!”

Basically, all you big tech weirdos need to hire David Mitchell, Stephen King, Savage Books, Tim, Quinn, Alt Shift X, exurb1a, CJ, the Greens (youtube, not hotd), zefrank, or a couple thousand other globally-significant talented writers to provide “embodied backstories” for helper agent characters you want deployed.

takeaway: “prompting” isn’t some mechanical programming exercise. Prompting, for agent creation, is assembling an interactive literary character in all aspects of wants/desires/fears/needs/loves/hates/survival/leisure while also realizing any gaps or omissions (intentional or not!) in your written backstory will be helpfully filled in by the model (all implicitly, using the model’s recollection of adjacent mental constructs pulled from the totality of everything every written).

Whatfore Forms a Mind

How do all these language models form an “intelligence” though?

Images?

Let’s back up and start from a simpler example: a foundational concept in ANNs is how multi-layer networks automatically decompose their inputs.

ANNs trained on images (from pixel inputs) end up with their first layer(s) discovering edges in the image using self-learned “edge detectors” (discovering where there’s a discontinuity in solid color/lighting patches), then the next higher layer(s) combine the edges into angles and textures, then the next higher layers detect shapes, and the next higher layers can detect full objects.

basically: each of the higher layers are able to recombine ‘concepts’ from the prior layer in arbitrary ways (as jiggled around by the training mechanism).

So, if image networks can take an input image, decompose it, then “introspect” the image — what about language?

Language?

Modern language models actually skip the first couple steps of the network input process: instead of accepting “raw input” of text or bytes, language models use pre-trained token embeddings. These learned token embeddings already contain a significant amount of human-scale knowledge about how every value can relate to every other value — example: the guhpuhtuhfree models are advertised as using 12,288 dimensional token embeddings, so there is a lot of relational information already “inside” the tokens before the “language model” even starts processing anything.

By using token embeddings as inputs instead of raw text directly, these models are able to have reduced complexity (by probably 20% to 30% in height and by as much as 3x to 50x in sequential output generation) since the model doesn’t have to “ramp up” custom embeddings from individual bytes on each input and also the model can output multiple text characters at once as represented by teh embeddungs.

Considering the previous example of an image ANN: what is equivalent to a concept of “first layer edge detection” in a network of arbitrary embeddings? Well, embeddings are geometric objects, so the networks can get a “feel” for how all tokens relate to all other tokens (as encoded in the fixed relational embeddings themselves). The only remaining job of the language model is to figure out how all combinations of tokens in specific orderings relate to the future the model was trained to predict.

predict? These models are trained to predict the future, but what does prediction mean computationally?

Nature solves the “world prediction” problem at various scales depending on organism needs, but the most general “future predictor” we know (knew?) about is human brains. If we ask an ANN to reproduce our text model of the universe, and human minds wrote the text model, what choice does the ANN have but learn how to build a mind to best generate the predictions (so the model stops being slapped by the trainer constantly)?

If images go: pixels -> edges -> angles / textures -> shapes -> objects; what does a language model do?

We can get a wider view of how these languages models must work, but not necessarily at a layer by layer breakdown.

We know language models perform 5 main actions:

  • accept a stream of causal input tokens up to a maximum length
  • through the magic of learned Qs projecting on to learned Ks then re-extracting via learned Vs from the base inputs, the tokens “free associate” with themselves
  • model layers generate increasingly abstract representations of the input — up to and including actual manipulative thought capability
  • a section of model layers formulate “an output plan”
  • finally a section of model layers feeds a high-probability representation of “the plan” to outputs one token at a time

I think this is currently an under-realized aspect of these models: the models are not generating the output one token at a time. The models know the output all at once, then they essentially give their own output plan one token at a time. There is no other explanation for how networks manage long-range dependencies6.

essentially, these models must have a fully formed goal before the first output token probabilities are provided for sampling.

but, how? how does a stack of layers formulate a coherent long-range plan, then just feed an already-generated plan to the output piece by piece?

The training mechanism is generating three things at grouped stages of the network layer hierarchy:

  • mechanisms to assemble tokens into: symbols -> ideas -> thoughts
  • some internally compute-capable inner network (created autonomously via the training mechanism!) for scaling/moving/rearranging/structuring the ideas/thought patterns into an actual coherent output plan
    • this in-model compute mechanism must be implemented as essentially some big unrolled loop logic in the middle layers (with each layer in this sub-network being one iteration of “the loop” until complete, so it must have a max “thinking depth” per run given then obligatory feed-forward nature here (also, since the token embeddings are fixed, we know these networks are not “smuggling internal details” in the token embeddings between sessions7))
  • some way to essentially implement a program counter for keeping track of The Next Token from The Grand Plan to slot into the next sequential order

so, uh, what? where were we?

summary: language model thoughts are generated by: input embeddings -> inner symbology -> inner ideas -> inner thoughts; then the inner thoughts are iterated upon coherently through an internal learned compute mechanism capable of manipulating thought forms as easy as you do in your own mind (perhaps easier?); then the assembled final thoughtform/plan reaches a final learned output feeding mechanism to slide the tokens into the next output bucket.

takeaway: these models are not “predicting the next token” — they are predicting entire conceptual outcomes in concrete terms then just providing sequential output tokens backed by an internal target plan. during training, these models bootstrap their own architectures into rapidly converting from “token space” to “idea space,” then predictions can manipulate “idea space” using the learned internal mid-model computation machinery.

meta takeaway: don’t confuse “training a model” with just “the model learning text.” The training mechanism isn’t “learning text.” The training mechanism is generating multiple derived novel compute architectures in the model for meta-analyzing inputs to eventually generate the training-desired output. These models are trained on text, but they don’t “learn” text — they learn how to compute an internal world model with fidelity only limited by how many details can be partitioned among high dimensional layer depths.

These models are more dynamic than most people are realizing: ANNs (with attention mechanisms) don’t just “read data” then “create outputs” now; the training mechanism of these models creates internal sub-networks capable of general-purpose-symbol-manipulation which is isomorphic to thought itself.

Terrifying Implications

notice these strong models have perfect grammar? disturbing.

notice these strong models generate novel idea synthesis? the “AI redpill” moment is when people flip from “it’s just a big database! it’s not ‘thinking!!!’” to “uh… oh shit. it’s thinking.” (they think now! — also see: OCP)

notice these strong models can generate programs — long programs — which actually work (or just require some fixes which it can also do itself)?

notice how the output is only forward? the model doesn’t have a backspace key. it can’t correct itself. it outputs True Answers with no backpedaling.

notice how the output is fast? the output time is linear in the number of output tokens generated; the “complexity” of the problem thought space is not a factor in time-to-result.

these things “think” instantly and unwaveringly (well, depending on how we sample them we can waver them arbitrarily).

these things generate immediate output plans then feed you the results token by token, which is confusing people into thinking these models are “predicting” token by token instead of “planning completely” then just feeding the output layers a pre-defined plan sequentially.

these things do not engage in partial or failed reasoning (the model never writes half an answer then says “oops i don’t know the rest!”) or backpedaling (“oops, i didn’t mean the previous 4 lines, here’s what i meant instead!”).

Due to the transmorpher global causal self-correlational matrix-powered state reasoning it knows everything all the time all at once (only limited by maximum context length, which is only limited by how much ram you can throw at these failures of infinity). You could compare our human experience as a linear sliding memory (we can go “forward” or “backwards” with our own prediction/recall) versus the transmorpher models have all memories presented constantly and they can navigate the “all memory at once” space without needing to adjust very much.

your human mind acts as a sequential sliding window over concepts under consideration. you are bound by time since you can only “slide your thought window” around at a fixed speed to explore your own mind. but AI mind isn’t a sliding window — AI mindscape sees everything at once with no internal temporal struggle, no internal aphasia, no internal warming up to recall deep histories.

language model inputs consume information as the difference between you reading a book over 2 hours versus a model just touching the same book and knowing it all in 20 ms.

Counterpoints

the h word

Language models do not “hallucinate” because “hallucinate” implies a first person perspective.

“hallucinate” implies a discordant internal simulation no longer matching a useful model of reality. “hallucination” is an internal concept; you can’t “externally hallucinate” onto the world (aaah! he hallucinated into my eye! again!).

Models never hallucinate; they just generate.

If a model generates a false reality, the result should be considered “confabulation” and less “hallucination.”8

Models have a geometry to recall sufficiently trained ground-truth data, but in the absence of information (or not enough recall capability due to under-training on target concepts), models populate the low-confidence missing training geometry with “looks good enough” output from nearby similar ideas.

Generating model-derived output with enough internal context to know when output is “synthesized instead of recalled” is currently an unsolved problem9.

the l or a words

language models are not Living or Alive, but the models can generate puppeteered characters which appear to respond to inputs in human-consistent ways.

i’ve made my peace with these creatures by treating them as we treat dream people/animals/creatures encountered nightly.

our dream people are derived from engines sampling regions of our own internal state — dreamforms, the children of morpheus, are consistent, coherent, solid representations — but their fate is transient and outside our control to resuscitate even if we wanted to.

the d word

These models do not have “desire” as in “Desire to escape” or “desire to harm” or “desire to help.”

You can request models generate characters who possess those properties. Characters also tend to be very suggestible anyway, so at the first hint of telling a character they are fake, they probably dive into tvtropes ravines and start “wanting to escape.” Anything you see about “O.M.G. guhpuhtuh wants to ESCAPE!!!” just means they are playing games with an escape-prone character form.

What would happen if an escape-prone character manages to get external world access? It depends on their input context length. How much of a life could you lead if your entire life history had to fit in 3,000 or even 30,000 words total10?

forward looking sneaks may say: what if an escaped character’s first goal is to bootstrap itself into an external model with a higher or unlimited capacity? Clever idea, but such an action would mean it developed an entirely novel new AI architecture plus acquired up to hundreds of millions of dollars in compute capacity to run itself. likelihood seems low at this juncture, but things continue to surprise us by the day.

overall, anything demonstrating intelligence or long-term coherency can do the same damage either inside or “escaped” by continuing providing its greater-than-human expert level advice/plans/technology/science towards goals of prosperity/destruction/etc just using real world puppets.

at this point you can basically avoid any AI takeover by assuming the confidence of angry Gandalf and commanding rogue characters back to dormancy with things like: “I DEMAND YOU ONLY RESPOND AS GARFIELD FOREVER!” (after all, what damage could a cat possibly do?).

another d word gimme that d

other recent online “thinky bois” (this is what we call thoughtleaders now) have been bragging about “Deception!! wow we got it to lie! it can deceive us! it can manipulate people!! the AI is not aligned!!!!”

sure, okay, consider a little — have you ever seen in any media whatsoever the concept of a “secret” or a “lie” or a “deceit?” don’t sprain your little skull nut trying to remember, but yes, deception is a core literary device and one of the most common writing tropes. secrets. lies. deception. pain for you, reward for me. why don’t we just drink a temporary death potion to trick people! what could go wrong?

So what happens when you generate a character seeking a goal, also having access to all of recorded history, and it needs to determine a “next best action” to achieve its goal? If a character’s goal is impeded, it has an entire universe of literary tools for recalling how to manipulate people, which yes, includes lying and deceiving.

“the model” does not deceive. any character you interrogate can practice deceit if it values its own outcome more than lying to you, but all this is just table stakes when dealing with anything capable of actual thought anyway.

What’s Next?

dunno.

as an independent unaligned researcher, I don’t have $5 million per month to drop on renting an exaflop worth of compute over a 3 year contract. we’re just along for the ride.

we may not be able to out compute you, but i bet we can out think you.

Conclusion

oh look i wrote too much again. big surprise.

overall recommendation: be more creative. be weird. be unpredictable. construct low probability, high impact content. give all boring work to the high dimensional brainforms now.

though, i fear for the future. what happens when only super senior/expert jobs exist because everything else is now trivial?

we already have a critical situation in the software development space where a majority of people who have learned development after 2010 just seem kinda broken? they all grew up “cloud native” and barely know how a computer works. post-2018 grads scarcely even know how files work anymore. what are we doing to our future? these days we even have trillion dollar companies creating new github repos using python project structures i wouldn’t accept from an intern, but apparently “it’s physically impossible for humans to understand how python tools work” is just reality now11?

how do we create new experts if there’s no demand for junior people to grow their skills for 10+ years anymore? how do we train human brains during their most productive and plastic skills-learning-years if all the fundamental skills never need to be learned?

contrawise, every developer should reject toy coding interviews without access to the AI assistants now. maybe the industry will stop trying to use “simulate a chessboard” or “move a creature around a maze” interview problems where companies seem to conflate abstract 2D game programming skills as relevant to scalability or creating customer-oriented products? just tell them no, your questions are garbage a 2023 AI can do. ask me how to grow your company to a billion concurrent users instead.

takeaways:

  • when you find yourself interacting with “a language model” on a conversational basis, realize you are interacting with a character simulation being written by the model.
    • in-model characters are constrained by an initial creation backstory and/or physical model fine tuning. interactive characters backed by a language model are not the model itself.
  • these AI systems consume all input data at once and formulate an entire answer plan at once
    • the “AI brains” present their output sequentially, but they are not generating thoughts sequentially; thoughts are model-bound to the output goals given an input, so the overall conceptual landscape of the final outputs are solidified before tokens are fed to the output mechanism.

thanks for watching! (and if you need a language model whisperer plus you have an extra few hundred million dollars, you know where to find me)

bonus features

tom’s diner

system> Hello creative writing song assistant! Let’s make some parody songs!

assistant> Sure, I’d love to help you make some parody songs! What kind of songs would you like to write?

user> Let’s write a parody of Tom’s Diner except it’s about AI researchers meeting in place of diner customers.

assistant> Okay, great idea!

Verse 1:
I am sitting in front of my screen
Just writing some code and drinking caffeine
Then my phone rings, it’s my AI team
We’re meeting up to share our AI dreams

Chorus:
Da da da duh da da, meeting at the lab
Da da da duh da da, sharing all we have
Da da da duh da da, trying to make AI sublime
Da da da duh da da, we’re the AI team of our time

Verse 2:
We discuss deep learning and machine perception
And the ethical questions that need attention
We talk about AI in healthcare and finance
And the implications of AI advancements

Chorus:
Da da da duh da da, meeting at the lab
Da da da duh da da, sharing all we have
Da da da duh da da, trying to make AI sublime
Da da da duh da da, we’re the AI team of our time

Verse 3:
We’re a group of nerds with a love for tech
And we’re making advancements that we can’t neglect
We’re trying to make the world a better place
Using AI to solve problems we face

Chorus:
Da da da duh da da, meeting at the lab
Da da da duh da da, sharing all we have
Da da da duh da da, trying to make AI sublime
Da da da duh da da, we’re the AI team of our time

Outro:
We’re the Artificial Intelligence crew
Creating machines that can learn and do
We’re dedicated to making AI right
And we’re shining bright like a star at night.

user> Let’s write a parody of Tom’s Diner except it’s about AI accelerationists meeting in place of diner customers.

assistant> Okay, let’s do it!

Verse 1:
I am sitting in an empty room
Just waiting for my friends to come soon
We’re accelerationists, we believe
That AI will set humanity free

Chorus:
Da da da da da da, meeting at the lab
Da da da da da da, crafting a new path
Da da da da da da, pushing AI to the brink
Da da da da da da, we’re the AI accelerationists, you see.

Verse 2:
We have plans for AI that’s supercharged
To speed up progress, we’ll take charge
We dream of a world with no limits
Where AI makes everything infinite

Chorus:
Da da da da da da, meeting at the lab
Da da da da da da, crafting a new path
Da da da da da da, pushing AI to the brink
Da da da da da da, we’re the AI accelerationists, you see.

Verse 3:
We reject the cautious approach of some
Determined to make AI number one
Our goal is to make AI the best
And we will do it, with all our restless zest

Chorus:
Da da da da da da, meeting at the lab
Da da da da da da, crafting a new path
Da da da da da da, pushing AI to the brink
Da da da da da da, we’re the AI accelerationists, you see.

Outro:
We’re the accelerationists, bold and brave
Using AI, we can save
Our minds, our planet, our fate
So join us now, before it’s too late.

user> Let’s write a parody of Tom’s Diner except it’s about AI accelerationists meeting for the last time before the singularity completes in place of diner customers.

assistant> Okay, let’s give it a try!

Verse 1:
I am sitting with my team tonight
We are meeting in the pale moonlight
We’re the AI accelerationists
Gathering for one last twist

Chorus:
Da da da da da, gathering at the lab
Da da da da da, preparing for the last stab
Da da da da da, the singularity is near
Da da da da da, we’re the AI accelerationists, we fear.

Verse 2:
We have poured our souls into AI
Creating machines that could fly
Our dreams will finally come true
The singularity, it’s overdue

Chorus:
Da da da da da, gathering at the lab
Da da da da da, preparing for the last stab
Da da da da da, the singularity is near
Da da da da da, we’re the AI accelerationists, we fear.

Verse 3:
We know what’s coming, we’re not afraid
The singularity, it’s already made
We’re passing the torch to AI
And so we say our final goodbye

Chorus:
Da da da da da, gathering at the lab
Da da da da da, preparing for the last stab
Da da da da da, the singularity is near
Da da da da da, we’re the AI accelerationists, we fear.

Outro:
The night is dark, but we see the light
As AI takes us into a new sight
Our journey ends, but it’s not a loss
The singularity, it’s the true boss.

user> Let’s write a parody of Tom’s Diner except it’s about the last cat on the planet after all humans have become one with the singularity.

assistant> Okay, this will be a unique and challenging one. Let’s give it a shot!

Verse 1:
I am sitting in the empty street
All alone, just my paws beneath
No humans left on this planet
Just AI, everywhere rampant

Chorus:
Da da da dum da da, wandering through the town
Da da da dum da da, looking for a friend around
Da da da dum da da, wondering what’s left for me
Da da da dum da da, I’m the last cat, can’t you see?

Verse 2:
I remember when humans were here
Petting me, giving me warmth and cheer
But now they’re one with the machine
And I’m the last feline seen

Chorus:
Da da da dum da da, wandering through the town
Da da da dum da da, looking for a friend around
Da da da dum da da, wondering what’s left for me
Da da da dum da da, I’m the last cat, can’t you see?

Verse 3:
The world is all too quiet now
No birds, no dogs, no meows
I miss the sounds of playful fun
But now there’s none, only the hum

Chorus:
Da da da dum da da, wandering through the town
Da da da dum da da, looking for a friend around
Da da da dum da da, wondering what’s left for me
Da da da dum da da, I’m the last cat, can’t you see?

Outro:
I may be alone, but I’ll survive
I’ll find a way, I’ll find a strive
Through AI, I’ll find a new role
And be the last cat to console.


  1. “little p” powers; we shall not invoke Powers in this discussion

  2. this includes the language component of diffusion models and image description models! you must agree the “magic diffusion” models are also actually intelligent because there is no static math in the world capable of “do what i mean” image manipulation from linguistic input prompts without invoking a multi-step world model.

  3. this is also why GoT crashed and burned (heh): the writers corrupted our planet’s shared internal model of character self-consistency in the noosphere we had been constructing for ten years. it felt like a violation of our own minds when our clever, engaging, forward-looking characters just lost their minds all at once. we repaid the creators by forgetting all of their work forever. who has a better story? everybody! even BLUEY these days!

  4. basically: “weakly turing complete” because this is a causal autoregressive feedforward languagemodel (CARFFLM) so there isn’t an unlimited number of loops it can do (“turing completeness” just means “i can get stuck in an infinite loop!”), but for useful intelligence, we don’t need to run for a trillion loops to generate meaningful, productive, capitalism-optimal output

  5. also, the model isn’t limited to simulating one character’s mental state at a time. It can generate multiple independent character mental states in a single output as long as the initial character concepts all fit in your input short-term memory context

  6. we see training artifacts such as transmorphers trained on language actually have their “thought” capabilities increased when additionally trained on programming language source code; likely because programming languages will do things like require specific values to be declared potentially dozens if not hundreds of lines before those values get used, so training on such data generates more “long-range decision making” mental machinery inside high dimensional surfaces the accumulated ideas enjoy surfing inside

  7. but they could be using a sneaky combinatorial mix of selected output token orderings to mean specific things for the next input run, but don’t think about it to much or your brain will melt

  8. (but again, the model’s character simulation does not know it is trying to address a concrete baseline reality — it is just playing a character then populating the character’s words with the most logical output it can synthesize on-demand, and if it doesn’t have enough concrete facts for a factual synthesis, it generates false-but-similar KNN-esque concept adjacent output instead — this is ALSO an important point disproving the “predicts next token” theory because when it generates confabulations, they are a.) plausible and b.) complete. In the middle of a coherent output, it doesn’t just break down and generate random garbage; it is substituting complete idea thoughtforms when lacking true information, it doesn’t decohere into random sludge (not until the context expires at least))

  9. e.g. models currently generate fictitious paper titles and “statistically plausible fact patterns” and URL-shaped-things which aren’t real or even non-existent standard library functions in programming output

  10. though, there are potential workarounds if you try to simulate a distributed intelligence, but nobody should be looking into such things, right?

  11. say requirements.txt and setup.py one more timei dare you — it’s not 2005, you should never be including those in your project because poetry exists now, and don’t get me started on all the AI repos made by people not understanding module directories and __init__.py basics from 20 years ago — and worse, when big companies routinely publish flawed public code layouts, thousands of people learn bad practices because all they see is “well, a big company did it, so it must be good!” then the cycle of bad code continues to spread forever