News | Gate.com

2026-04-24

03:21

DeepSeek 的 V4 訓練數據翻倍至 33T，引發不穩定性並延遲發布

Gate 新聞訊息，4 月 24 日——DeepSeek 的 V4 技術報告顯示，V4-Flash 與 V4-Pro 分別在 32T 與 33T tokens 上進行預訓練，比用於 V3 的約 15T tokens 翻了一倍。報告承認在訓練過程中遇到「重大不穩定性挑戰」，損失尖峰反覆發生，原因是 Mixture-of-Experts MoE 層中的異常；而路由機制本身也會加劇這些異常，單純的回滾也無法解決問題，loss spi

展開

03:04

DeepSeek 发布 V4 开源模型系列：1.6T 参数与 MIT 许可

AI 行業動態

Gate News 消息，4 月 24 日——DeepSeek 已在 MIT 授权下发布 V4 系列开源模型，权重现已在 Hugging Face 和 ModelScope 上提供。该系列包含两个 (MoE) 混合专家（Mixture-of-Experts）模型：V4-Pro 总参数 1.6 万亿、每 token 激活 49 亿

展開

06:25

字節跳動 Seed 團隊發布 Seed3D 2.0：增強幾何精度與材質生成能力

AI 工具應用

Gate News 訊息，4 月 23 日 — 字節跳動的 Seed 團隊釋出 Seed3D 2.0：一款文字轉 3D 模型，能從單一影像生成帶紋理的 3D 資產。此次升級聚焦幾何精度與材質真實感，API 現已在 Volcano Ark 上提供。幾何生成採用「由粗到精」的兩階段策略：大型參數 DiT 模型首先建立粗粒度拓樸，隨後恢復銳利邊緣與精細表面。材質生成使用「專家混合」MoE 架構，以提升高解析度細節，並引入「視覺語言模型」VLM 的先驗，改善在未知光照條件下材質分解的穩定性，輸出完整的 PBR 貼圖，與標準渲染流程相容。共有 60 位具備 3D 建模經驗的評估者進行盲測對比，涵蓋約 200 個測試案例，將 Seed3D 2.0 與 Hunyuan3D-2.5/3.1、Tripo 3.0、Rodin Gen2、HiTem v2.0 以及先前的 Seed3D 1.0 進行比較。幾何生成偏好率介於 65.1% 到 98.3% 之間，而帶紋理的 3D 資產偏好率在所有比較中均超過 69%。面向下游應用，Seed3D 2.0 可將 3D 資產分解為具聯合資訊的獨立組件，輸出符合 URDF 格式，並相容 Isaac Sim 與其他模擬引擎，用於機器人抓取等動態互動情境。在場景層級，它支援文字、多視角影像或影片輸入，並結合多個資產以生成完整場景。

展開

13:41

阿里巴巴 Qwen Lab 發布具稀疏 MoE 架構的 Qwen3.6-35B-A3B 模型

AI 行業動態

阿里巴巴的 Qwen Lab 已推出 Qwen3.6-35B-A3B，這是一款開源的大型語言模型，採用稀疏混合專家（mixture-of-experts）架構，具備代理式程式設計能力，方便與第三方程式碼助理整合；模型參數規模為 35 billion。

展開

01:51

美團開源LongCat-Next：3B參數統一視覺理解、生成與語音

Meituan Longcat Team's open-source LongCat-Next is a multimodal model based on MoE architecture, integrating five capabilities including text, visual understanding, image generation, and speech. Its core design DiNA achieves unified task processing through discrete tokens, while the dNaViT used in the visual aspect enables excellent image generation performance. Compared with similar models, LongCat-Next demonstrates leading benchmark performance across various metrics, showcasing its advantages in multimodal understanding and generation.

展開

06:36

Cursor發佈Composer2技術報告：RL環境完全模擬真實用戶場景，底座模型得分提升70%

Cursor 發布了 Composer 2 技術報告，介紹了其 Kimi K2.5 MoE 架構的完整訓練方案，包括兩階段訓練和自研基準 CursorBench。經過訓練，Composer 2 的表現顯著提升，並在推理成本上優於其他前沿模型。

展開

06:27

光標發布 Composer 2 技術報告，底座模型得分提升 70%

專案進展

Cursor於3月25日發布Composer 2技術報告，揭示了Kimi K2.5模型的訓練方案，採用MoE架構，參數量達到1.04萬億。訓練分為兩階段，使用真實場景模擬進行強化學習，最終在CursorBench基準上取得61.3分，提升70%，且推理成本低於其他大模型API。

展開

02:27

Meituan Open-Sources 560B Parameter Theorem Proving Model, Achieving 97.1% Pass Rate on 72 Inferences Refreshing Open-Source SOTA

Meituan's LongCat team open-sourced LongCat-Flash-Prover on March 21, a MoE model with 560 billion parameters, focused on Lean4 formal theorem proving. The model is divided into three capabilities: automatic formalization, sketch generation, and complete proof generation, combining reasoning tools with the Lean4 compiler for real-time verification. Training employs the Hybrid-Experts Iteration Framework and HisPO algorithm to prevent reward manipulation. Benchmark tests show that the model has set records for open-source weight models in automatic formalization and theorem proving.

展開

06:55

Mistral AI 發佈 Leanstral：首個 Lean 4 開源代碼 Agent，可自動輸出形式化證明

專案進展

Mistral AI 發佈開源代碼代理 Leanstral，專為 Lean 4 形式化驗證設計，能生成可自動校驗的代碼和證明。該模型採用稀疏 MoE 架構，表現優於其他頂級模型，並提供免費下載和 API 調用。

展開