Generative AI Watch: Text-based Data is the Next Logical Evolution of Synthetic Data


Summary Bullets:

R. Bhattacharyya

• Synthetic unstructured data, or text, can be used to train and finetune large language models (LLMs) used in customer support applications or chatbot conversations.

• The application of synthetic data, both tabular and unstructured, will continue to grow, driven by a need for additional training data as well as concerns over data privacy.

On October 1, 2024, MOSTLY AI announced that its platform can help enterprises create synthetic text, a timely new capability given the growing interest by enterprises to leverage GenAI to extract insights from unstructured data. Over the past several years, much of the conversation around synthetic data has focused on using GenAI to create synthetic tabular data. Tabular data is structured data that can be neatly organized, for example information that can be arranged in an excel file. The logical next step is to use GenAI to create text-based information that can be used to customize LLMs.

Continue reading “Generative AI Watch: Text-based Data is the Next Logical Evolution of Synthetic Data”