• FaceDeer@kbin.social
    link
    fedilink
    arrow-up
    11
    ·
    edit-2
    1 year ago

    No, the reason ChatGPT can’t recite the text of Harry Potter verbatim is because it doesn’t actually “contain” it. It learned from it, but it doesn’t “remember” it word-for-word. There is no filter against reciting copyrighted material. Try asking it to recite a scene from a Shakespearean play, for example - that’s out of copyright and ChatGPT was almost certainly trained on it. It may be able to quote some famous lines because it’s been overfit like crazy on them (“To be or not to be” is probably everywhere on the Internet) but that’s not a verbatim chunk.

    I’ve actually experimented with this myself on my local machine, I took one of the smaller open-source models and I gave it additional training using 20 megabytes of My Little Pony fanfiction. The AI knew a lot about the fanfic afterward but it was clearly just picking up tidbits of general knowledge rather than “remembering” the whole thing.