Indian origin entrepreneurs lead generative AI advancements

Veeramachaneni emphasized the importance of synthetic data for testing software applications, stating, “It’s about ensuring organizations trust this new data

DataCebo / DataCebo

DataCebo, a spinout from the Massachusetts Institute of Technology (MIT), has emerged as a trailblazer in the field of generative artificial intelligence. Established in 2020 by Principal Research Scientist Kalyan Veeramachaneni and MIT alumna Neha Patki, the company gained prominence for its innovative software system, the Synthetic Data Vault (SDV).

With over a million downloads, DataCebo's SDV has become synonymous with providing a solution to the challenge of limited or sensitive real-world data. The software specializes in generating realistic synthetic data, proving invaluable for organizations engaged in testing software applications and training machine learning models.

DataCebo's impact extends across various industries, from aviation to healthcare. The company's flight simulator, powered by SDV, aids airlines in preparing for rare weather events. In healthcare, synthetic medical records generated by SDV predict outcomes for patients with conditions like cystic fibrosis. Notably, SDV has also been utilized to create synthetic student data, contributing to the evaluation of admissions policies. The software harnesses the power of AI to address real-world challenges.

Following efforts to incorporate SDV features for larger companies, Neha Patki shared her insights, saying, "It's common for industries to possess sensitive data, and dealing with such data often involves navigating regulations. Even in the absence of legal regulations, companies find it in their best interest to carefully manage access to sensitive information. Therefore, synthetic data is always superior from a privacy perspective."

Veeramachaneni emphasized the importance of synthetic data for testing software applications, stating, “It’s about ensuring organizations trust this new data. Our tools offer programmable synthetic data, which means we allow enterprises to insert their specific insight and intuition to build more transparent models.”

He added, “In the next few years, synthetic data from generative models will transform all data work. We believe 90 percent of enterprise operations can be done with synthetic data.”

The company’s recent releases, including the SDMetrics library and SDGym for assessing data realism and comparing model performances, pushed their pursuit for transparency and responsible AI adoption.

Founded at MIT, DataCebo boasts an all-MIT alumni staff. The company's journey began in 2016 when Veeramachaneni's Data to AI Lab unveiled open-source generative AI tools, laying the groundwork for DataCebo's establishment in 2020. Veeramachaneni highlighted the significance of synthetic data for testing software applications, emphasizing the need for organizations to trust new data. He explained, "[Our tools offer] programmable synthetic data, allowing enterprises to incorporate specific insights and intuition to build more transparent models."