A Bittensor subnet has achieved state-of-the-art (SOTA) performance in AI safety with the launch of Trishool’s HaloGuard 1.0, a model designed to prevent unsafe prompts from reaching AI applications. Utilizing a Qwen3.5-based architecture, HaloGuard-1.0 has demonstrated superior safety performance across seven benchmarks, with its 4B model ranking first overall among evaluated models. This advancement highlights the competitive capabilities of smaller open-weight models in addressing complex AI safety challenges, underscoring that decentralized subnets like Bittensor are becoming key players in developing effective AI prompt safety measures.
Trishool: Trishool is the team behind a decentralised AI red-teaming subnet on Bittensor SN23 focused on building safety infrastructure. It developed and released HaloGuard 1.0 as a multilingual input guard for AI systems. The project demonstrates the potential for subnet-based teams on Bittensor to address hard problems in prompt safety and model alignment.
Bittensor: Bittensor is a decentralized network that uses its subnet structure to incentivize the collaborative development of machine learning models and AI capabilities. Trishool runs its decentralised AI red-teaming operations as subnet SN23 on the Bittensor network. The SOTA achievement by HaloGuard highlights how Bittensor’s incentive model can support specialized work on challenging AI safety tasks.
HaloGuard 1.0: HaloGuard 1.0 is a family of Qwen3.5-based models designed as constitutional input classifiers for prompt safety. It functions as a first-layer guard that evaluates user prompts before they reach downstream LLMs, agents, or applications. The 0.8B and 4B variants have achieved leading results on multiple open-weight prompt-safety benchmarks.
AI Safety Infrastructure: Decentralized subnets on networks like Bittensor are increasingly used to develop specialized tools for AI prompt safety and red-teaming.
Open-Weight Model Progress: Smaller open-weight guard models are demonstrating competitive performance against larger counterparts in safety benchmarks for multilingual applications.
