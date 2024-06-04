(Bloomberg) -- Nvidia Corp. is co-leading a $50 million investment in Twelve Labs Inc., placing its latest bet on a pair of Korean-born engineers who want to help users quickly search and analyze troves of video.

US venture house New Enterprise Associates and existing backers including Radical Ventures, Index Ventures SA and Korea Investment Partners took part in the financing, the startup said in a statement. The deal sustains a frenetic pace of startup investments for Nvidia, which since 2023 has funneled capital into up-and-coming AI firms from Hugging Face and Cohere to Mistral AI.

Twelve Labs scores a big-name backer in Nvidia, whose chips are fundamental to the training and development of generative AI. The US chipmaker is building a portfolio of investments across pivotal AI spheres from hardware to models and apps. Its latest pick is a San Francisco-headquartered startup that provides foundational models to perform a variety of tasks, such as building chatbots or translating languages. The over-arching aim is to make video searchable and understandable.

Twelve Labs was conceived in 2021 after co-founders Jae Lee and Aidan Lee met during basic military training in their native Korea. Its customers include social media influencers, sports leagues in the US and Europe and Hollywood movie studios — some with archives going back 75 years. The startup aims to make searches easier by retrieving precise moments within a sea of online content — say, when a particular football player celebrates a touchdown with a front flip, or the times Gordon Ramsay got mad at over-cooked eggs.

“Video has been a decades-old problem in the field of AI. It’s information-dense and challenging to leverage,” Jae Lee, who is also chief executive officer, told Bloomberg News. “Nearly 80% of the world’s data is in video. For us, video is the first language and we built our tech ground up.”

Twelve Labs aims to collaborate with Nvidia to put its Marengo and Pegasus platforms in front of more users. Unlike other models that mainly work with text, they started out training on videos, which in turn helps make visual-based searches more intuitive, Lee said. The AI model works with video, text, image and audio, allowing search across multiple types of data inputs such as text-to-video, text-to-audio and image-to-video.

“We started before multimodal was a thing,” the CEO said. “We began our work before foundational models were cool.”

Twelve Labs said its models are used by more than 30,000 developers across industries like media and entertainment, advertising, automotive and security. They use its models for semantic video search and in generating summaries. The startup expects its headcount to double in size to about 80 in 2024.

The startup’s latest model Pegasus, which generates text from video, is in beta-testing. It’s designed to understand and search through complex video content, helping summarize, query and find answers, and analyze. Twelve Labs trains multiple components of the foundational model simultaneously, reducing its size to about a fifth of when originally started. That in turn boosts computing and energy efficiency.

The advancements make videos as easy to work with as text, and “don’t break the bank,” said Lee, the CEO.

