Vol.01 · No.10 Daily Dispatch April 25, 2026

Latest AI News

AI · PapersDaily CurationOpen Access
AI NewsResearch
5 min read

DeepSeek’s 1M-context open model pushes costs down, challenges closed rivals

DeepSeek V4 combines a 1.6T-parameter MoE with million‑token prompts and cut‑rate API pricing. New papers show how to make MoEs cheaper to serve, reshoot videos in 4D, and rein in vision‑language hallucinations.

Reading Mode

One-Line Summary

Open models push long‑context reasoning at far lower token costs while fresh research sharpens MoE efficiency, 4D video reshooting, and hallucination control in vision‑language systems.

LLM & SOTA Models

DeepSeek V4 preview narrows gap with top closed models

DeepSeek V4 lets teams load huge documents and codebases into a single prompt and charges less per token than most frontier systems; it comes in two versions (Flash and Pro) built as mixture‑of‑experts models with 1 million‑token context windows. The Pro model totals 1.6 trillion parameters with 49 billion active per request, and TechCrunch notes it is positioned as the largest open‑weight model, ahead of Kimi K 2.6 (1.1T) and MiniMax M1 (456B); Flash is 284B total with 13B active. 1

On competitive coding and reasoning, coverage highlights include a Codeforces rating of 3,206 and 93.5 on LiveCodeBench for V4‑Pro, with V4 models described as comparable to GPT‑5.4 on coding competitions. These wins coexist with areas where other leaders still lead (e.g., Opus on long‑context retrieval, GPT‑5.4 on Terminal Bench), underscoring task‑by‑task differences. 2

There are tradeoffs: media reporting points out the preview is text‑only (no audio/image/video I/O) and that V4 trails GPT‑5.4 and Gemini 3.1 Pro on some knowledge tests, suggesting a remaining several‑month gap to state‑of‑the‑art. 1

Pricing is a headline: V4 Flash lists $0.14 per million input tokens and $0.28 per million output tokens; V4 Pro lists $0.145 per million input tokens and $3.48 per million output tokens, undercutting peers at similar capability tiers. 1

Research Papers

Temporally extended Mixture-of-Experts models reduce switching and memory churn

Serving sparse MoE models often means swapping experts almost every token, which thrashes memory and breaks offload/prefetch tricks; this work adds a learned controller (in the reinforcement‑learning options framework) that decides when to keep or switch expert sets across multiple tokens. In plain terms, it keeps a good “team” on the field for longer to cut GPU traffic. 3

Applied to a gpt‑oss‑20b base with low‑rank adapters and self‑distillation rewards, the approach reduces expert‑set switch rates from over 50% to below 5% while retaining up to 90% of base‑model accuracy on MATH, MMLU, and MMMLU — pointing to practical memory savings without a heavy quality hit. 3

Context from concurrent studies: controlled comparisons find well‑tuned MoEs can surpass dense LLMs under strictly equal total parameters, compute, and data, with an optimal activation rate around 20% recurring across sizes — a design cue for future deployments. 4

Vista4D reshoots videos from new angles using 4D point clouds

Vista4D lets you take an existing video and “reshoot” it from a new camera path, reconstructing the scene as a 4D point cloud (space over time) to preserve what was actually seen. This directly targets common failure modes where methods lose depth or break scene consistency when the viewpoint changes. 5

The system builds a 4D‑grounded representation using static‑pixel segmentation and 4D reconstruction, and trains with reconstructed multiview dynamic data. Reported results show stronger 4D consistency, tighter camera control, and better visual quality than state‑of‑the‑art baselines across varied camera trajectories. 5

It lands alongside other 3D/4D advances — for example, 3D Scene Prompting constructs a static 3D memory for scene‑consistent, camera‑controllable generation — pointing to a toolkit for controllable, geometry‑aware video synthesis. 6

When prompts override vision: benchmark and DPO method to curb LVLM hallucinations

Large vision‑language models sometimes trust the text prompt more than the image, producing confident but incorrect details; this paper introduces HalluScope to measure that effect and HalluVL‑DPO, a preference‑optimization recipe that steers models toward visually grounded answers. 7

The authors report that prompts and textual priors are major drivers of hallucination and show that their fine‑tuned models mitigate these failures while maintaining or improving scores on other hallucination and vision evaluations; they plan to release the benchmark, preference dataset, and code. 7

Related techniques also help: AFTER — an activation‑editing method — reports up to a 16.3% hallucination reduction on the AMBER benchmark with minimal overhead, indicating both training‑time and inference‑time paths to more trustworthy multimodal outputs. 8

Why It Matters

Lower‑cost, long‑context models like V4 make million‑token prompts — entire codebases, multi‑day transcripts — economically plausible, expanding what small teams can automate without per‑query sticker shock. At the same time, research on temporally extended MoEs shows how to serve sparse models more efficiently at scale.

On the perception side, 4D video reshooting and targeted hallucination fixes move AI from “looks good in a short clip” to “stays consistent and factual across views and instructions,” which is essential for creative tools, analytics, and safety‑critical use.

This Week to Try

  1. DeepSeek V4 API quickstart: Use the sample code and pricing rundown to test V4‑Pro vs V4‑Flash on your own prompts. See the walkthrough with code. https://dev.to/owen_fox/deepseek-v4-released-open-source-16t-moe-1m-context-apache-20-and-its-already-on-the-api-14d6
  2. Explore Vista4D results: Skim the paper’s figures to see how 4D reshooting preserves geometry when the camera path changes. https://arxiv.org/abs/2604.21915

Sources 9

Helpful?

Comments (0)