LLM Inference on a Static Manifold: <span class="mil-thin">A Gauge-Theoretic Framework</span>

LLM Inference on a Static Manifold: A Gauge-Theoretic Framework

 <h5>Dataset at a Glance</h5> <ul> <li><strong>Name:</strong> Retrieval-Based Multi-Turn Chat SFT Synthetic Data</li> <li><strong>Size:</strong> 100,000 entries</li> <li><strong>Format:</strong> Multi-turn Dialogue</li> <li><strong>Languages:</strong> Primarily English, Chinese, Japanese, and German</li> <li><strong>Generation Cost:</strong> >$25,000</li> <li><strong>Synthesis Tokens:</strong> ~62 billion</li> </ul> <h5>Download Link</h5> <p><strong><a href='https://huggingface.co/datasets/CausalLM/Retrieval-SFT-Chat' target='_blank'>Explore the Dataset on Hugging Face &rarr;</a></strong></p> 

Retrieval-Based Multi-Turn Chat SFT <span class="mil-thin">Synthetic Data</span>

Retrieval-SFT-Chat: A New Synthetic Dialogue Dataset

 <h5>Model Highlights</h5> <p><b>Parameters:</b> 9B LLM (initialized from GLM-4-9B-Chat-1M) + Optional 5B ViT</p> <p><b>Context Window:</b> 1,000,000 tokens</p> <p><b>Modalities:</b> Text and Image (with Locked-Image Tuning)</p> <p><b>Training Data:</b> 120M+ entry synthetic dataset generated from a 20B token corpus.</p>
<table> <tbody><tr> <th>Capability</th> <th>Description</th> <th class='highlight-column highlight-header'>miniG</th> <th>Gemini-Flash</th> <th>GLM-4-9B-Chat</th> <th>Llama 3.1 8B Instruct</th> </tr> <tr> <td class='bold'>MMLU</td> <td>Representation of questions in 57 subjects<br>(incl. STEM, humanities, and others)</td> <td class='highlight-column bold'>85.45</td> <td>78.9</td> <td>72.4</td> <td>69.4</td> </tr> <tr> <td class='bold'>IFEval</td> <td>Evaluation of instruction-following<br>using verifiable prompts</td> <td class='highlight-column'>74.22</td> <td>-</td> <td>69</td> <td class='bold'>80.4</td> </tr> <tr> <td class='bold'>GSM8K</td> <td>Challenging math problems<br>(5-shot evaluation)</td> <td class='highlight-column'>75.89 (5-shot)</td> <td class='bold'>86.2 (11-shot)</td> <td>79.6</td> <td>84.5 (8-shot CoT)</td> </tr> <tr> <td class='bold'>HumanEval</td> <td>Python code generation on a held-out dataset<br>(0-shot)</td> <td class='highlight-column bold'>79.88</td> <td>74.3</td> <td>71.8</td> <td>72.6</td> </tr> <tr> <td class='bold'>GPQA</td> <td>Challenging dataset of questions<br>from biology, physics, and chemistry</td> <td class='highlight-column'>37.37</td> <td class='bold'>39.5</td> <td>34.3 (base)</td> <td>34.2</td> </tr> <tr> <td class='bold'>Context Window</td> <td>Maximum context length<br>the model can handle</td> <td class='highlight-column bold'>1M</td> <td class='bold'>1M</td> <td>128K</td> <td>128K</td> </tr> <tr> <td class='bold'>Input</td> <td>Supported input modalities</td> <td class='highlight-column highlight-footer'>Text, image<br>(single model)</td> <td>Text, image, audio, video</td> <td>Text only</td> <td>Text only</td> </tr> </tbody></table>
<p><strong><a href='https://huggingface.co/CausalLM/miniG' target='_blank'>Explore the Model on Hugging Face &rarr;</a></strong></p> 

miniG 9B VLM with <br /> <span class="mil-thin">1M Context Window</span>

Introducing miniG 9B

 <h5>The Anime Showcase: A Proof of Concept</h5> <p>To demonstrate our method, we've released a massive anime-themed dataset. Key features include:</p> <ul> <li><strong>Size and Scope:</strong> Over 1 million entries and ~440 million GPT-4/3.5 tokens.</li> <li><strong>Diverse Sources:</strong> Sourced from a wide range of online anime communities and wikis.</li> <li><strong>Advanced Refinement:</strong> Carefully processed using GPT-3.5 and GPT-4 to improve clarity and reduce noise.</li> <li><strong>Cost Breakdown:</strong> Estimated generation cost of ~$25K, with GPT-4-32K accounting for at least 25% of the data.</li> </ul> <p><strong><a href='https://huggingface.co/datasets/CausalLM/Refined-Anime-Text' target='_blank'>Explore the Dataset on Hugging Face &rarr;</a></strong></p> 

Our "Special Sauce" for <span class="mil-thin">Synthetic Datasets</span>

Unlocking LLM Potential with Our "Special Sauce" for Synthetic Data

 <h5>A Note on Usage</h5> <p>The Guanaco model has not been filtered for harmful, biased, or explicit content. As a result, outputs that do not adhere to ethical norms may be generated. Please exercise caution when using the model in research or practical applications.</p> <h5>Model Limitations</h5> <p>Guanaco is a 7B-parameter model, and any knowledge-based content should be considered potentially inaccurate. We strongly recommend providing verifiable sources for knowledge-based answers and informing users of this limitation to prevent the dissemination of false information and maintain transparency.</p> 

Research Areas

Connect

About CausalLM

CausalLM Omni Research Initiative

Pioneering the Future of
Artificial Intelligence

Building Next-Generation
AI Systems

Advancing Omni-Modal

Artificial Intelligence

Omni-Modal
Foundation Models

Real-Time
Multi-Stream Processing

Multilingual
Knowledge Synthesis

Million-Token
Context Windows

Latest Research Updates:

LLM Inference on a Static Manifold: A Gauge-Theoretic Framework

Retrieval-SFT-Chat: A New Synthetic Dialogue Dataset

Research Areas

Connect

About CausalLM

CausalLM Omni Research Initiative

Pioneering the Future of Artificial Intelligence

Building Next-Generation AI Systems

Advancing Omni-Modal

Artificial Intelligence

Omni-Modal Foundation Models

Real-Time Multi-Stream Processing

Multilingual Knowledge Synthesis

Million-Token Context Windows

Latest Research Updates:

LLM Inference on a Static Manifold: A Gauge-Theoretic Framework

Retrieval-SFT-Chat: A New Synthetic Dialogue Dataset

Pioneering the Future of
Artificial Intelligence

Building Next-Generation
AI Systems

Omni-Modal
Foundation Models

Real-Time
Multi-Stream Processing

Multilingual
Knowledge Synthesis

Million-Token
Context Windows