I finally get vector embeddings and this whole thing about transformers.
Here’s a Deepseek explanation:

If it sounds confusing, it was to me too until I started working on a project for myself (a discovery app for linux so that I can find apps more easily), then it all made sense.
What I’m using for this app is a sentence-transformer model. It’s a tiny (~90MB) pre-trained LLM that embeds vectors over 384 dimensions.
What this means is in the picture above:
#Your transformer model does this:
“read manga” → [0.12, -0.45, 0.87, …, 0.23] # 384 numbers!
“comic reader” → [0.11, -0.44, 0.86, …, 0.22] # Very similar numbers!
“email client” → [-0.89, 0.32, -0.15, …, -0.67] # Very different numbers!
These embeddings map semantic meaning, just by transforming sentences to a series of numbers. You can see ‘comic reader’ is close to ‘read manga’, but not the exact same. But closer than “email client” is. So mathematically, we know that read manga and comic reader must have similar meanings in language.
When you search for “read manga” in my app, it transforms that query into vectors. Then using cosine similarity, which basically measures the distance between embeddings mathematically, you can calculate the similarity between two concepts:

With a transformer, at no point do you have to pre-write a synonyms list (for example that ‘video’ is a synonym of mp4, film, movie, etc - traditionally this would be hard-coded, someone makes a list of synonyms one by one).
Instead, the sentence-transformer model is pre-trained and learned these semantic relationships by itself, and you can just use that model that’ll work out of the box by itself.
It is itself a very small LLM, and both this one and the big LLMs use the transformer architecture. Except the sentence-transformer stops once it has vector embeddings.
and this is how Deepseek in the example above is able to tell me about Komikku, which is a real app to read manga, instead of randomly naming VLC or inventing a fake name. You’ll notice it was also able to write a pretty convincing example of vector embeddings, by making actual close and further apart numbers.
This stuff gets complex fast, even now I’m not sure I’m accurately representing how LLMs work lol. But this is a fundamental mechanism of LLMs, including CLIP models in image generation (to encode your prompt into something the checkpoint can understand).
FAISS is the second step; it’s what allows us to go query that vector matrice. So after we’ve transformed your search query into a vector, we need to compare it to other vectors. This is what FAISS is for, it’s like we talk to a database with SQL.
I don’t know if this is true of everyone but I learn best by having actual real projects to delve into. Using LLMs like this, anyone can start learning about concepts that seem completely above their level. The LLM codes the app but even with this level of abstraction/black box (I’m not entirely sure of the app’s logic, though the LLM can explain it to me), you learn new things. And most of all: you can start making new solutions that you couldn’t before.
The app
Sentence transforming makes for a very powerful search engine and not only that, anyone can do it. I’m making my app with crush, and it works. In fact it took less than 24 hours to get it all up and running (and I was asleep for some of those hours).

Example of usage. Similarity is the cosine angle that we talked about, and how far away the two embeddings (your search and the app’s description) are from each other.
No synonym dictionary, no regex fuzzying, no hard-coded concepts. Everything is automatically built and searched, and this is very powerful. It also means you can do very long specific queries, and it will find results.
The next steps are 1. optimization and 2. improving search results more. This is just simple cosine angle, but we can do more if we had more data than just the short-description. This is a bit of a challenge as I’m not sure where to get that data from exactly, but we’ll get there. I’m sure there’s also a bunch more math we could add to refine the results.
As you can see the best similarity for that query above (which is admittedly very specific) is only 58.4% - the cosine difference expressed as percentage. I want to be able to reliably get up to 75% at least. And to do that, you can either refine how the embedding works or add complementary methods to what’s already there.
I keep saying it: LLMs and this whole tech allow us to approach problems in a different way. This is completely different from doing fuzzy search, which was the standard for years. 12 hours is all you need to deploy a search engine now.
As for optimization the app needs the ~90MB model on your machine to embed your search query, and creates some heavy json files (10+ MB). This is okay for the prototype, and I’m fine with going up to ~250MB total disk space personally, but no more. The bigger problem is it runs on pytorch for development which takes up Several gigabytes of space which is just ludicrous. But first I want to finish the prototype and implement everything, then I can look at optimization and refactoring.
Oh yeah and it can also do that out of the box:

(Katawa Shoujo being included was funny lol, but the rest works: we name a completely different app in the query and still find results but for books, novels, ebooks, etc!)
If I ever finish this - I’m having some issues with crush I want to fix, and I don’t want to rush and burn myself out - I’ll put it up on my codeberg under MIT licence.
And with Deepseek being so cheap total cost so far is not even 2.50$ lol. You can power this sort of development with 1$ ko-fi donations.
PS: If you want me to search for some apps though I can send you some results if there’s something you’re really looking for. It could be worth building a small index of unsatisfactory results too.

Hmm, okay. I was more thinking of errors the model itself made in describing how things work. But correcting other people is reasonable too.
That said, I’m not so sure about this point:
From what I can find through a cursory search, there is something called vector embedding going on, at least in the context of LLMs. I guess this smaller model is a different story if it is a wholly different kind of architecture, as you say.
https://medium.com/@narendra.squadsync/vector-embeddings-in-large-language-models-llms-3e746f1063f3
https://labs.adaline.ai/p/how-do-embeddings-work-in-llms
https://ml-digest.com/architecture-training-of-the-embedding-layer-of-llms/
(I don’t know if these sources are reliable, it’s just what I could find.)
Respectfully, there’s no embedding of vectors, even in these examples. These are all examples of models which generate embeddings or the embedding layers in the models themselves. You’re implicitly proving my point about how hard it is to understand this work
An easy thought experiment is if you incrementally “embed vectors” or accumulate any data into a given model, it eventually expands to being unhostable.
In reality, we train models and transformers so they have internal latent representations via their weights.
Many neutral nets have embedding layers, but those are about mapping and shaping the data as it flows through the model. But again, these are trained not “embedded”
But to the larger point, these are quality resources from people which have been vetted (via PageRank I assume). It’s so much easier to have these discussions with a static knowledge base, vs AI output that can never be replicated verbatim
So like the difference between a database and a snapshot impression of one? If so, I think I get the distinction you mean. That models are not storing the embeddings themselves, but instead are storing a representation of them, as trained on. (Feel free to say if that’s still misleading terminology.)
The models are storing weights, the embeds are the encoded representation of the current context and do not get stored between runs.
Maybe something like a painter with a photo perfect memory vs one who was trained in art school
Assume that the overall goal is something like painting a sunset.
The perfect memory painter has to see and memorize all of the various sunsets they could create (this is what “embedding a vector” would be) and would only be able to recreate one of those sunsets
The trained painter instead learns an internal representation of a sunset after X amount of training. They can’t fully recreate any sunset 1:1, but they can create a much wider range of sunsets because they aren’t bound by what they’ve directly embedded/memorized/stored
In ML terms, the painting would be a generated embedding (i.e. a painting itself is not the sunset, but a representation of one)
Similarly, it would be theoretically impossible to achieve the same chatgpt level performance if you “embedded vectors” into a model, as it would have to hold all those exabytes (or more) of text, videos, images, etc. in memory in real time.
Finally, there’s what we call bias-variance trade-off in the balancing of 1:1 recreations vs unique creations, but that’s a whole other can of worms outside the scope of this post
Ok, thanks, makes sense with the general idea of what I’ve heard about how things work. Like I’m aware models aren’t storing all the details of what they train on (far from it) but I’m not clear on all the terminology.
This doesn’t make any sense. “Incrementally embed vectors”?
The embeds are generated from the context, they aren’t stuffed into the model like in a trash bag.
That’s exactly my problem with the original post.