I finally get vector embeddings and this whole thing about transformers.
Here’s a Deepseek explanation:

If it sounds confusing, it was to me too until I started working on a project for myself (a discovery app for linux so that I can find apps more easily), then it all made sense.
What I’m using for this app is a sentence-transformer model. It’s a tiny (~90MB) pre-trained LLM that embeds vectors over 384 dimensions.
What this means is in the picture above:
#Your transformer model does this:
“read manga” → [0.12, -0.45, 0.87, …, 0.23] # 384 numbers!
“comic reader” → [0.11, -0.44, 0.86, …, 0.22] # Very similar numbers!
“email client” → [-0.89, 0.32, -0.15, …, -0.67] # Very different numbers!
These embeddings map semantic meaning, just by transforming sentences to a series of numbers. You can see ‘comic reader’ is close to ‘read manga’, but not the exact same. But closer than “email client” is. So mathematically, we know that read manga and comic reader must have similar meanings in language.
When you search for “read manga” in my app, it transforms that query into vectors. Then using cosine similarity, which basically measures the distance between embeddings mathematically, you can calculate the similarity between two concepts:

With a transformer, at no point do you have to pre-write a synonyms list (for example that ‘video’ is a synonym of mp4, film, movie, etc - traditionally this would be hard-coded, someone makes a list of synonyms one by one).
Instead, the sentence-transformer model is pre-trained and learned these semantic relationships by itself, and you can just use that model that’ll work out of the box by itself.
It is itself a very small LLM, and both this one and the big LLMs use the transformer architecture. Except the sentence-transformer stops once it has vector embeddings.
and this is how Deepseek in the example above is able to tell me about Komikku, which is a real app to read manga, instead of randomly naming VLC or inventing a fake name. You’ll notice it was also able to write a pretty convincing example of vector embeddings, by making actual close and further apart numbers.
This stuff gets complex fast, even now I’m not sure I’m accurately representing how LLMs work lol. But this is a fundamental mechanism of LLMs, including CLIP models in image generation (to encode your prompt into something the checkpoint can understand).
FAISS is the second step; it’s what allows us to go query that vector matrice. So after we’ve transformed your search query into a vector, we need to compare it to other vectors. This is what FAISS is for, it’s like we talk to a database with SQL.
I don’t know if this is true of everyone but I learn best by having actual real projects to delve into. Using LLMs like this, anyone can start learning about concepts that seem completely above their level. The LLM codes the app but even with this level of abstraction/black box (I’m not entirely sure of the app’s logic, though the LLM can explain it to me), you learn new things. And most of all: you can start making new solutions that you couldn’t before.
The app
Sentence transforming makes for a very powerful search engine and not only that, anyone can do it. I’m making my app with crush, and it works. In fact it took less than 24 hours to get it all up and running (and I was asleep for some of those hours).

Example of usage. Similarity is the cosine angle that we talked about, and how far away the two embeddings (your search and the app’s description) are from each other.
No synonym dictionary, no regex fuzzying, no hard-coded concepts. Everything is automatically built and searched, and this is very powerful. It also means you can do very long specific queries, and it will find results.
The next steps are 1. optimization and 2. improving search results more. This is just simple cosine angle, but we can do more if we had more data than just the short-description. This is a bit of a challenge as I’m not sure where to get that data from exactly, but we’ll get there. I’m sure there’s also a bunch more math we could add to refine the results.
As you can see the best similarity for that query above (which is admittedly very specific) is only 58.4% - the cosine difference expressed as percentage. I want to be able to reliably get up to 75% at least. And to do that, you can either refine how the embedding works or add complementary methods to what’s already there.
I keep saying it: LLMs and this whole tech allow us to approach problems in a different way. This is completely different from doing fuzzy search, which was the standard for years. 12 hours is all you need to deploy a search engine now.
As for optimization the app needs the ~90MB model on your machine to embed your search query, and creates some heavy json files (10+ MB). This is okay for the prototype, and I’m fine with going up to ~250MB total disk space personally, but no more. The bigger problem is it runs on pytorch for development which takes up Several gigabytes of space which is just ludicrous. But first I want to finish the prototype and implement everything, then I can look at optimization and refactoring.
Oh yeah and it can also do that out of the box:

(Katawa Shoujo being included was funny lol, but the rest works: we name a completely different app in the query and still find results but for books, novels, ebooks, etc!)
If I ever finish this - I’m having some issues with crush I want to fix, and I don’t want to rush and burn myself out - I’ll put it up on my codeberg under MIT licence.
And with Deepseek being so cheap total cost so far is not even 2.50$ lol. You can power this sort of development with 1$ ko-fi donations.
PS: If you want me to search for some apps though I can send you some results if there’s something you’re really looking for. It could be worth building a small index of unsatisfactory results too.

I wonder what those “more issues” are when you want someone to “delete your post” over a bunch of nitpicks (some of them wrong even)…
An encoder model generates embeddings for the input. The embeddings are tensors. Vectors are 1-dimensional tensors. Most models use higher-dimensional tensors, but those could also be view-ed as 1-dimensional. So, every model with embeddings embeds vectors.
When LLMs were invented, 90Mb models were large.
You’re contradicting yourself in your own paragraph fam
But also
Which one is it, do models generate embeddings or do they embed vectors?
And to be clear, I believe that your assertion that models “embed vectors” is incorrect, I just want you to clarify your rebuttal 🤷🏿♀️
To the last point, is it still the 1900s? Do we still call movies talkies? Language matters, especially when your intent is to educate
LLMs are integrally tied to transformer architectures, and transformers allowed enabled devs to scale language models to becoming LLMs
GPT 1, the first LLM, was 117 million parameters, which is much large than OPs tiny “LLM”