Here's what vector embeddings and transformers are in LLMs

CriticalResist8 · 18 days ago

Here's what vector embeddings and transformers are in LLMs

amemorablename · 17 days ago

Hmm, okay. I was more thinking of errors the model itself made in describing how things work. But correcting other people is reasonable too.

That said, I’m not so sure about this point:

No model embeds vectors afaik.

From what I can find through a cursory search, there is something called vector embedding going on, at least in the context of LLMs. I guess this smaller model is a different story if it is a wholly different kind of architecture, as you say.

https://medium.com/@narendra.squadsync/vector-embeddings-in-large-language-models-llms-3e746f1063f3

https://labs.adaline.ai/p/how-do-embeddings-work-in-llms

https://ml-digest.com/architecture-training-of-the-embedding-layer-of-llms/

(I don’t know if these sources are reliable, it’s just what I could find.)

Philo_and_sophy · 17 days ago

Respectfully, there’s no embedding of vectors, even in these examples. These are all examples of models which generate embeddings or the embedding layers in the models themselves. You’re implicitly proving my point about how hard it is to understand this work

An easy thought experiment is if you incrementally “embed vectors” or accumulate any data into a given model, it eventually expands to being unhostable.

In reality, we train models and transformers so they have internal latent representations via their weights.

Many neutral nets have embedding layers, but those are about mapping and shaping the data as it flows through the model. But again, these are trained not “embedded”

But to the larger point, these are quality resources from people which have been vetted (via PageRank I assume). It’s so much easier to have these discussions with a static knowledge base, vs AI output that can never be replicated verbatim

amemorablename · 17 days ago

So like the difference between a database and a snapshot impression of one? If so, I think I get the distinction you mean. That models are not storing the embeddings themselves, but instead are storing a representation of them, as trained on. (Feel free to say if that’s still misleading terminology.)

m532 · 17 days ago

The models are storing weights, the embeds are the encoded representation of the current context and do not get stored between runs.

Philo_and_sophy · edit-2 17 days ago

Maybe something like a painter with a photo perfect memory vs one who was trained in art school

Assume that the overall goal is something like painting a sunset.

The perfect memory painter has to see and memorize all of the various sunsets they could create (this is what “embedding a vector” would be) and would only be able to recreate one of those sunsets

The trained painter instead learns an internal representation of a sunset after X amount of training. They can’t fully recreate any sunset 1:1, but they can create a much wider range of sunsets because they aren’t bound by what they’ve directly embedded/memorized/stored

In ML terms, the painting would be a generated embedding (i.e. a painting itself is not the sunset, but a representation of one)

Similarly, it would be theoretically impossible to achieve the same chatgpt level performance if you “embedded vectors” into a model, as it would have to hold all those exabytes (or more) of text, videos, images, etc. in memory in real time.

Finally, there’s what we call bias-variance trade-off in the balancing of 1:1 recreations vs unique creations, but that’s a whole other can of worms outside the scope of this post

amemorablename · 17 days ago

Ok, thanks, makes sense with the general idea of what I’ve heard about how things work. Like I’m aware models aren’t storing all the details of what they train on (far from it) but I’m not clear on all the terminology.

m532 · 17 days ago

This doesn’t make any sense. “Incrementally embed vectors”?

The embeds are generated from the context, they aren’t stuffed into the model like in a trash bag.

Philo_and_sophy · 17 days ago

That’s exactly my problem with the original post.

Here's what vector embeddings and transformers are in LLMs

Here's what vector embeddings and transformers are in LLMs

The app