Dolphin X1 Trinity Nano has officially launched on Hugging Face, introducing a new 6B mixture-of-experts model designed for de-alignment while maintaining its instruction-following capabilities. This model is the first to be trained entirely within a custom multi-stage reinforcement learning environment developed by Dolphin, which incorporates hard gates and multiple parallel judges to ensure robust performance without compromising on general capabilities. Additionally, the model is accessible in various formats, such as full weights and compressed versions, allowing it to run efficiently even on low-end hardware and mobile devices.
Arcee: Arcee develops the Trinity series of models that serve as the foundation for Dolphin releases. The organization produces base checkpoints noted for strong generalization and information absorption during subsequent training. Dolphin X1 Trinity Nano was built by applying specialized RL de-alignment directly to an Arcee Trinity Nano model.
Arcee AI: Arcee AI creates the Trinity series of models, including the Trinity Nano base used for this release. The company focuses on producing efficient, generalist models that absorb information well during fine-tuning. Dolphin X1 Trinity Nano directly applies de-alignment techniques to an Arcee AI Trinity Nano checkpoint.
Targon Compute: Targon Compute supplies on-demand GPU clusters, including the 8xB200 node used for final training and judge model hosting in this project. The provider supports AI experimentation through accessible high-performance compute infrastructure. Its resources enabled the completion of the online RL training run for Dolphin X1 Trinity Nano.
Prime Intellect: Prime Intellect offers hosted platforms for reinforcement learning workflows that facilitate rapid iteration on training environments. The service was utilized early in the Dolphin project to test and refine the RL setup before scaling to dedicated hardware. This support accelerated development of the de-alignment environment behind the new model.
Dolphin X1 Trinity Nano: Dolphin X1 Trinity Nano is a 6B MoE language model with 1B active parameters developed by the Dolphin team as their smallest decensored release to date. It was trained entirely using a custom online RL environment designed for de-alignment while preserving instruction-following capabilities. The model is hosted on Hugging Face and serves as the first full application of Dolphin’s new RL setup for reducing refusal rates in the Trinity series base.
Training Approach: Dolphin developed a custom multi-stage RL environment with hard gates, multiple parallel judges, and style-specific buckets to achieve strong de-alignment without sacrificing general capabilities.
Model Accessibility: The release provides multiple downloadable formats on Hugging Face, including full weights, GGUF, and FP8, enabling local inference on modest hardware including mobile devices.
Partnership Support: The project relied on specialized infrastructure partners for compute and RL hosting to complete training of the new decensored model.
