You must log in or # to comment.
Any data sets produced before 2022 will be very valuable compared to anything after. Maybe the only way we avoid this is to stick to training LLMs on older data and prompt inject anything newer, rather than training for it.