A non-profit research initiative advancing the frontiers of artificial intelligence. We focus on omni-modal AI systems, efficient architectures, and synthetic data at scale.
Retrieval-Based Multi-Turn Chat SFT Synthetic Data, a new 100k entry, multi-turn synthetic dialogue dataset for SFT, building on our work with CausalLM/Refined-Anime-Text.
We introduce our unique recipe for generating high-quality synthetic datasets to boost LLM performance, featuring our new 1M+ entry Anime dataset as a proof of concept.