Copyright law does not explicitly cover inclusion in a training data set, though that will be tested by a number of court cases currently underway.
Copyright historically is exactly what it says in its name: the right to copy or reproduce an image for commercial purposes. Because an AI doesn’t reproduce the images in its training data set, and because AI generation models do not include the image data of their training set, it’s not explicitly covered.
My personal opinion is that copyright law would need to be updated to cover the training data case, but the courts could circumvent that and declare it covered under existing law. That would be based on a misunderstanding of how image generation works, but courts don’t always necessarily act based on technical understanding.
The number of people with strong opinions on AI vastly exceeds the number of people who understand transformers architecture.