According to Beating, Nous Research has open-sourced Lighthouse Attention, a long-context training mechanism that achieves 17x speedup for 512K-length text processing on a single B200 GPU, and 1.4–1.7x end-to-end training acceleration at 98K length. The technique uses a coarse-to-fine approach: it first scans compressed summaries at different levels to identify core segments, then passes the filtered text to FlashAttention for processing. In tests on a 5.3-billion-parameter model trained on 50 billion tokens, the approach not only reduced training time but matched or exceeded baseline performance of fully-attention-based training.
Related News
OpenAI adds ChatGPT crisis conversation detection, improving the ability to warn about self-harm and violence
Bittensor TAO Climbs Above $300 as AI Crypto Demand Surges
Experts Say Zk Proofs Give DePINs an Edge as AI Trust Demands Rise