Training data. They’re absolutely desperate for it.
Each tiny incremental improvement to an LLM requires an exponential increase in the anount of training data, to the point where Earth is literally running out, and the most accessible sources like Reddit and Twitter have locked the doors and put up price lists.
Recently, 2 different user agents started scraping my Lemmy instance at nearly the same time : AmazonBot and ClaudeBot
I wonder if (and how) it may be related to this headline
Training data. They’re absolutely desperate for it.
Each tiny incremental improvement to an LLM requires an exponential increase in the anount of training data, to the point where Earth is literally running out, and the most accessible sources like Reddit and Twitter have locked the doors and put up price lists.
They’re starving dogs hunting for scraps.