© Paolo Gallo/Shutterstock.com
AI models like ChatGPT or Gemini need a lot of computing resources, a lot of energy, but also a lot of training data. And to provide new data that will allow AI labs to train their AI models, Harvard is going to create a huge database of a million books, via its new Institutional Data Initiative project.
This data could be used to train future AI models, since it is a work that has fallen into the public domain and is therefore no longer protected by copyright. According to Wired magazine, this dataset is five times larger than Books3, a dataset that the Meta group used to train its Llama model.
The project is supported by OpenAI and Microsoft, with the participation of Google, through its Google Books initiative. The goal is to put all stakeholders on an equal footing, given that the dataset will be accessible for free. Indeed, while large organizations like OpenAI or Google can pull out their checkbooks to access copyrighted texts, it can be more complicated for a small startup.
200% Deposit Bonus up to €3,000 180% First Deposit Bonus up to $20,000In addition, Harvard’s Institutional Data Initiative doesn’t plan to stop there, as it is already working with the Boston Public Library to digitize millions of news articles that are already in the public domain. And according to Wired, the university is open to other partnerships.
Otherwise, it should be noted that this is not the only initiative of its kind. For example, in March 2024, the Hugging Face platform published a dataset comprising a total of 500 billion words, with texts in English, French, Dutch, Spanish, German, and Italian.
📍 To not miss any Presse-citron news, follow us on Google News and WhatsApp.
[ ]
An unexpected guest or an unfortunate oversight? Don't panic, the editorial team at "Midi Libre…
La jeune adolescente se rendant chez sa mère. Illustration MaxPPP The 17-year-old girl was hit…
This Saturday, December 21, Audrey Fleurot éwas received on the set of Quelle ÉPoque. During…
New revelations concerning Gérard Depardieu were made in the columns of Télérama this Monday, December…
While the cathedral is now reopened, its towers, which are subject to a fee and…
Young Romano Floriani Mussolini scored his first professional goal. ANSA - EMANUELE PENNACCHIO This Sunday,…