The core idea of the model is to reduce resource consumption when dealing with long texts. The system uses a unit called the Lightning Indexer to identify the most important segments of the text, then applies a Fine Token Selection mechanism to extract key words and process them within the attention window.
According to initial results, this technology could reduce API call costs by up to 50%, which represents a major breakthrough in improving the efficiency of running large language models. However, the company stressed the need for additional independent testing to confirm these findings.
It is worth noting that DeepSeek had previously stirred debate earlier this year with the launch of its R1 model, which gained attention thanks to its lower cost compared to American competitors. Today, while the V3.2-exp model may not create the same buzz, it represents an important practical step toward making AI operations more efficient and less expensive.
Analysts believe this technology could help major companies—especially in the United States—reduce their operational expenses in AI while maintaining the same performance levels. This opens the door to a new phase of competition between Chinese innovation and Western technologies.