Two undergrads built an AI speech model to rival NotebookLM

Karla T Vasquez

12 months ago

A pair of undergrad, not with broad AI skill, says they have created a publicly available AI model that can create a podcast-style clip similar to Google’s NotebookLM.

The market for synthetic speech equipment is widespread and growing. Elevlab is one of the largest players, but there is no shortage of challengers (Play, sesame and more). Investors believe that these tools have infinite potential. PitchbookStartups developer Voice AI Tech collected more than $ 398 million of VC Fund last year.

Toby Kim, Korea-based co-founder LaboratoryBehind the newly released model, the party says that he and his co-founder started learning about the speech three months ago. Inspired by the notebookLM they wanted to create a model that offers further control over the generated voice and “Freedom of the script”.

Kim says they have used Google’s TPU Research Cloud Program, which provides free access to researchers’ female models, the company’s TPU AI chips for Dia training. By weight in 1.6 billion parameters, the DIA can create dialogue from any script, allow users to customize the speakers of the speakers and insert isolation, cough, laughter and other incredible signals.

The parameters are used to predict internal variable models. Generally, models with more parameters perform better.

AI Dev is available from the platform Hug And GithubDIA can run on most modern PCs, including at least 10 GB VRAM. It generates a random voice unless it is requested with a intended style details, it can also clone a person’s voice as well.

In the short examination of the DIA through TechCrunch women Web demoThe DIA has done a very good job, produces two-way chat about any topic. The quality of the voice seems to be competitive with other tools and the voice cloning function is the easiest that this reporter tried.

Here is a sample:

Like many voice generatorsHowever, the DIA offers a little proposal for security. It would be trivial to create specialty or scammy recording. On the pages of the DIA project, discouraging the abuse of the female model, disguised, or otherwise involving illegal propaganda, but the group says it is “not responsible for abuse”.

Watero did not reveal that it scraped for any data DIA training. It is possible that diarrhea was made using copyrighted materials – A commentator Hacker’s “Planet Money” sounds like a sample in the news notes of the NPR. Training models in copyrighted content are a wide but legally suspicious practice. Some AI agencies claim that fair use protects them from their responsibility, while the rights hold the sert, claiming that fair use does not apply to training.

In any event, Kim says that the woman’s plan is to create a synthetic voice platform with a “social direction” on top of future models. Nario also intends to publish a technical report for the DIA and expand the support of the model in languages outside English.

Related Posts