Gate News message, April 24 — DeepSeek V4-Pro and DeepSeek V4-Flash were officially released and open-sourced on April 24, with context processing length significantly expanded from 128K to 1M, representing nearly a 10-fold capacity increase. Huawei Computing announced that its Ascend supernode products fully support DeepSeek V4 series models through close collaboration between chip and model technologies.
Huawei Ascend 950 achieves high-throughput, low-latency DeepSeek V4 model inference deployment through fused kernel and multi-stream parallelism techniques to reduce Attention computation and memory access overhead. For DeepSeek V4-Pro with 8K input, Ascend 950 achieves approximately 20ms TPOT with 4,700 TPS single-card Decode throughput; for DeepSeek V4-Flash under 8K input, it reaches approximately 10ms TPOT with 1,600 TPS throughput. Ascend A3 supernode series also achieves full compatibility, with training reference implementations provided for rapid fine-tuning. Based on Ascend A3 64-card supernode with large EP mode, DeepSeek V4-Flash achieves over 2,000 TPS single-card Decode throughput in 8K/1K input-output scenarios using vLLM inference engine. Huawei’s full Ascend A2, A3, and 950 product lines support both DeepSeek V4-Flash and V4-Pro.
Huawei Cloud announced first-mover compatibility with DeepSeek V4, providing developers with one-click API token services through its MaaS platform. Huawei Cloud optimized system layer, operator layer, and cluster layer capabilities to ensure rapid model adaptation and high-performance deployment. Enterprises including Kingsoft WPS and 360 have already integrated DeepSeek’s new model via Huawei Cloud.
Cambricon also announced Day 0 compatibility with DeepSeek V4-Flash and V4-Pro based on the vLLM inference framework, with adaptation code open-sourced to the GitHub community. Cambricon previously achieved first-mover adaptation when DeepSeek V3.2 was released last year, having conducted deep software-hardware collaborative performance optimization on DeepSeek series models.
Related News
DeepSeek V4-Flash goes live on Ollama Cloud, US-hosted: Claude Code, OpenClaw one-click integration
DeepSeek releases the V4 open-source preview, with a technical score of 3206 surpassing GPT-5.4
Tencent open-sourced Hy3 preview version, code benchmark tests improved by 40% over the previous generation