AI crawlers cause Wikimedia Commons bandwidth demands to surge 50%

Karla T Vasquez

1 year ago

Wikimedia Foundation, Wikipedia Umbrella Agency and one dozen or more Another Wednesday Croudsorsed Knowledge Projects say that Bandwidth for Multimedia Download Wikimedia Commons From January 2024 to 50% increased.

Because, the clothing wrote a Blog post Tuesday, not because of the growing demand from knowledge-thymian people, but automatically, data-hungry scrapers are looking for AI models to train.

The post states, “Our infrastructure is built to maintain traffic spikes suddenly from people during high-suit events, but the amount of traffic produced by scraper bot presents unprecedented and increasingly risk risk and expenditure,” the post states.

Wikimedia Commons is a freely accessible repository of images, videos and audio files that are found under an open license or otherwise in the public domain.

By digging, Wikimedia says that about two-thirds of the most “expensive” traffic (65%)-that is, the maximum resources-the most resources-it was from the bot. However, only 35% of the overall survivors come from these bot. According to Wikimedia, the cause of this discrimination is frequently accessible materials are close to the user’s cache user, while other low-cost content is stored far away at the “Core Data Center” from which the content is more expensive to serve. It is the type of content that the bot usually look for.

Wikimedia writes, “Although human readers are specific – often similar – there is tendency to concentrate on topics, crawler bot. ‘Bulk Reid’ also visits a lot of pages and less popular pages,” Wikimedia writes. “This means that these types of requests are more likely to move forward in the original dataceter, which makes it more expensive in the use of our resources.”

The long and short of all of these is the Wikimedia Foundation’s site reliability team has to spend a lot of time and resources to block the crawls to avoid disruption for regular users. And all this is what we face the base before considering the cost of the cloud.

In fact, it represents part of a rapid growing tendency that threatens the existence of the open Internet. Software Engineer and Open Source Advocate last monthDrew Develot dare the truth the truth AI Creollers ignore “Robots.Text” files that are designed to stop automatic traffic. And ”Practical engineer“Gargelly Oros Also complaints AI scrapers of agencies like Meta last week have driven bandwidth claims for its own projects.

Especially the open source infrastructure, especially, There are firing linesDevelopers are fighting “cleverness and revenge”, such as TechCrunch wrote last week. Some technology companies are also biting them to solve the problem – for example, Cloudflare recently AI has launched the laboratoryWhich uses AI-exposed materials to slow down the crawls.

However, it is a cat and mouse game that can last many publishers to login to the back of the login and pay-wal-back covers-near-near-near Harm to everyone who uses the web todayThe

Related Posts