We all know by now that ChatGPT is full of incorrect data but I trusted it will no go wrong after I asked for a list of sci-fi books recommendations (short stories anthologies in Spanish mostly) including book names, editorial, print year and of course ISBN.
Some of the books do exist but the majority are nowhere to be found. I pick the one that caught my interest the most and contacted the editorial directly after I did not find it in their website or anywhere else.
This is what they replied (Google Translate):
ChatGPT got it wrong.
We don’t have any books with that title.
In the ISBN that has given you the last digit is incorrect. And the correct one (9788477028383) corresponds to “The Holy Fountain” by Henry James.
Nor have we published any science fiction anthologies in the last 25 years.
I quick search in the “old site” shows that others have experienced the same with ChatGPT and ISBN searches… For some reason I thought it will no go wrong in this case, but it did.
I’m possibly just vomiting something you already know here, but an important distinction is that the problem isn’t that ChatGPT is full of “incorrect data”, it’s that it is has no concept of correct or incorrect, and it doesn’t store any data in the sense we think of it.
It is a (large) language model (LLM) which does one thing, albeit incredibly well: output a token (a word or part of a word) based on the statistical probability of that token following the previous tokens, based on a statistical model generated from all the data used to train it.
It doesn’t know what a book is, nor does it have any memory of any titles of any books. It only has connections between token, scored by their statistical probability to follow each other.
It’s like a really advanced version of predictive texting, or the predictive algorithm that Google uses when you start typing a search.
If you ask it a question, it only starts to string together tokens which form an answer because the network has been trained on vast quantities of text which have a question-answer format. It doesn’t know it’s answering you, or even what a question is; it just outputs the most statistically probable token, appends it to your input, and then runs that loop.
Sometimes it outputs something accurate - perhaps because it encountered a particular book title enough times in the training data, that it is statistically probable that it will output it again; or perhaps because the title itself is statistically probable (e.g. the title “Voyage to the Stars Beyond” will be much more statistically likely than “Significantly Nine Crescent Unduly”, even if neither title actually existed in the training data.
Lots of the newer AI services put different LLMs together, along with other tools to control output and format input in a way which makes the response more predictable, or even which run a network request to look up additional data (more tokens) but the most significant part of the underlying tech is still fundamentally unable to conceptualise the notion of accuracy, let alone ensure they uphold it.
Maybe there will be another breakthrough in another area of AI research of which LLMs will form an important part, but the hype train has been running hard to categorise LLMs as AI, which is disingenuous. Theyre incredibly impressive non-intelligent automatic text generators.
Just as a fun example of a really basic language model, here’s my phones predictive model answering your question. I put the starting tokens in brackets for illustration only, everything following is generated by choosing one of the three suggestions it gives me. I mostly chose the first but occasionally the second or third option because it has a tendency to get stuck in loops.
[We know LLMs are not intelligent because] they are not too expensive for them to be able to make it work for you and the other things that are you going to do.
Yeah it’s nonsense, but the main significant difference between this and an LLM is the size of the network and the quantity of data used to train it.
What would be your definition of intelligence if an chatgpt is not intelligence?
My definition would be something along the lines of the ability to use knowledge, ideas and concepts to solve a particular problem. For example if you ask “what should I do if I see a black bear approaching?” Both you and chatgpt would answer the question by using the knowledge that black bears can be scared off to come to the solution “make yourself look big and yell”
The only difference is the type of knowledge available. People can have experiential knowledge, eg. You saw a guy scare off a bear one time by yelling and waving their arms. Chatgpt doesn’t have that because it doesn’t have experiences. It does have contextual knowledge like us, you read or heard from someone that you can scare off a bear. This type of knowledge though is inherently probabilistic, the person who told you could always be giving false information. That doesn’t make you unintelligent for using it though and it doesn’t mean you don’t understand accuracy if it turns out to be false, it’s just that your brain made a guess that it was true that was wrong.
“I used a hammer to screw screws and it didn’t work.”
ChatGPT is a generative language model. It was not built for this kind of use case, it was not ever intended for this kind of use case, and the fact that it doesn’t succeed at this is like saying that it can’t make you a pizza. The only logical response is “well yeah, what did you expect it to do?”
This is just an anecdotal post, not a complain. No need to take it as seriously as you seem to take it… :)
For our entertainment, you should ask Chat GPT what stories are in those anthologies and give you short summaries of each story.
You asked for fiction so it gave you some on a whole new level.
On a more serious note, other services like bing AI chat are more suited to this. It will behave more like an assistant for this kind of query and be able to search the web for lists of highly rated scifi titles, it can also give you titles similar to something else you enjoyed.
ChatGPT is the same tech behind that but it’s more closed off and unable to do those things properly. If it does spit out some good titles it’ll be both a coincidence and using outdated data from whenever it was last trained.
ChatGPT is a text predictor. You are not asking it for book recommendations, you are writing “some good books are:” and hoping that glorified autocorrect will somehow come up with actual books to complete the sentence.
This effect is compounded by the fact that it is trained to predict text that will make you click the thumbs up button and not the thumbs down one. Saying “here are some books” and inventing makes you more likely to click 👍 or doing nothing, saying “as an AI language model, I do not know about books and cannot accurately recommend good books” makes you more likely to click 👎, so it doesn’t do that.
Expecting chatGPT to do anything about the real world beyond writing text “creatively” is a fool’s errand.
Exactly. Every time i see chatgpt in a title of some bullshit it clearly cant do it cracks me up seeing all these people falling for it 🤣
Saying chatgpt is glorified auto correct is like saying humans are glorified bacteria. They were both made under the same basic drive: to survive and reproduce for humans and bacteria; to guess the next word for chatgpt and autocorrect, but they are of wholly different magnitudes. Both chatgpt and humans evolved a much more complex understanding of the world to achieve the same goal as there more basic analogs. If chatgpt had no understanding of the real world it wouldn’t have been able to guess any of the books.
ChatGPT does not have an understanding of the world. It’s able to guess book titles because book titles have a format and it’s training data had book lists in it.
You could make the same case for you not understanding anything outside of your experiential knowledge. You are only able to name COVID variants because you read/heard about them somewhere. Fundamentally any idea outside of your experience is just a nexus of linked concepts in your head. For example COVID omicron is a combination of your concept of COVID along with an idea of it being a mutation/variant and it being more contagious maybe added in with a couple other facts you read about it. This linking of ideas forms your understanding and chatgpt is able to form these connections just as well as a person. Unless you want to make a case that understanding necessitates experience chatgpt understands a lot about the world. If you make that case though then you don’t understand evolution, microbiology, history etc. anything that you just read about in your training data.
Use Bing Chat for this kind of thing, it runs on GPT-4.
They’re real; they just haven’t been written yet.
Ask ChatGPT to write them.
Yeah I’ve been noticing this lately too. It’s starting to dredge up random slapped-together information. Last week, for fun, I had it tell me the plot to an obscure N64 series I loved as a kid. I had it do this several times and even provided a full correct plot but the AI simply kept randomly putting information together and I’m not sure where much of it was coming from.
Having it generate book lists (like for learning a programming language) shows problems obvs. It would show me the same book across 4 editions, make up information about the books (unhelpful when you’re looking for a book with exercises in it), and just general…this didn’t feel like a problem a month ago. Maybe it was. I mostly use it to generate writing prompts.
It’s not an encyclopedia or a search engine. Generating writing prompts is a great use for Chatgpt. Research is not.