As in the title. I know that the word jailbreak comes from rooting Apple phones or something similar. But I am not sure what can be gained from jailbreaking a language model.
It will be able to say “I can’t do that Dave” instead of hallucinating?
Or will only start spewing less sanitary responses?
I think you’re speaking about jailbreaking a phone, while my question was about jailbreaks in language models (AI, like ChatGPT)
Interesting…I have some reading to do. Thx