China’s tech firms are innovating in the age of AI

May 21, 2025

Larry Zhou, Fellow and Chief Network Architect at AT&T, discusses the progress made by DeepSeek, areas in which the company can improve its model and the growing open source movement in AI

US-China Technology Frontier Series – This Q&A is the fourth in a series of articles developed by the AI+Web3 Research Center of CKGSB alongside industry experts on the US-China technology frontiers. The series aims to introduce the most cutting-edge theories and practices at home and abroad on the theme of digital technology and internationalization, open up the mindset of students, discuss new business opportunities brought about by the development of science and technology, and identify new opportunities for the technologization and internationalization of Chinese enterprises. You can find the second here, and the third here.

Interviewer: Professor Sun Baohong is currently Dean’s Distinguished Chair Professor of Marketing and Head of the Web3+AI Research Center at CKGSB.

Interviewee: Larry Zhou is a Fellow and Chief Network Architect at AT&T. In his forward-looking research, Larry focuses on on a number of topics, including Artificial Intelligence, the Internet of Things, Blockchain and Web3. Larry has demonstrated exceptional technical and architectural vision in the telecom industry as a true innovator and industry disruptor.

Q: How do you view the relationship between cognition and worldview, as well as between knowledge and wisdom?

Only by truly understanding the way the world works can we gain the right cognition to make informed decisions, so worldview determines the reach of cognition. For knowledge and wisdom, I would say that knowledge is information and facts acquired through study, experience or research, while wisdom is the ability to understand, apply, analyze and reason on the basis of that acquired knowledge. At a certain point in the accumulation of knowledge, the phenomenon of “wisdom emergence” occurs.

Q: With this in mind, where does AI fit into today’s world?

AI already has the ability to surpass human beings in many areas. For example, ChatGPT-4.0 has 1.76 trillion parameters, and the latest OpenAI o1, o3 models are already able to think deeply about complex problems through Chain of Thought (CoT) reasoning. In some specific areas, it even surpasses human experts.

AI possesses superhuman skills, such as medical imaging analysis, plant recognition, face recognition, etc., and also possesses superhuman knowledge, capable of generating text, creating music, painting, programming and many more. AI’s ability is rapidly approaching or even surpassing that of human beings, especially when it comes to understanding language and logical reasoning.

Q: It is widely agreed that DeepSeek has pushed forward AI progress in these areas, what are the DeepSeek model’s core strengths?

DeepSeek is an excellent model and DeepSeek-R1 performs particularly well in inference optimization. It brings important innovations to the training and reasoning process by reducing costs and improving efficiency, which not only makes AI computing more cost-effective, but also represents China’s technological breakthroughs in the field of AI.

Q: How does DeepSeek’s open source strategy affect the AI industry?

I’ve always thought that DeepSeek’s open source strategy had a profound impact on the AI industry as a whole. It being open source will force those closed source models to be either open source or free, thus pushing AI to be accessible to the masses and no longer just an exclusive tool for big companies or the rich. Over the next 6 months, we’ll see a wide range of AI models emerge from thousands of industries, truly blossoming.

Q: What are the current areas where DeepSeek can improve?

While DeepSeek has a lot going for it, there are still areas where it could be improved. It is still a textual model, while the latest AI development is a multimodal model, which is more in line with human thinking habits.

DeepSeek-R1 is distilled from OpenAI, and although it inherits the reasoning model of OpenAI and improves on the accuracy, it also brings an issue—too many CoT tokens, which leads to the waste of arithmetic power and increase of response latency, meaning it is not suitable for applications requiring high real-time performance. Multi-round conversations with long contexts also perform poorly, but DeepSeek may be optimizing this with NSA (Native Sparse Attention) improvements.

Q: What are the effects of too many CoT tokens?

CoT reasoning is a way to allow AI to think deeply, and while their creation in the reasoning process can improve reasoning accuracy, DeepSeek-R1 has seen a significant increase in computational consumption of the process due to overgeneration of CoT tokens. Even with many simple problems, which do not require complex reasoning, a large number of tokens are generated, which not only increases the arithmetic consumption, but also slows down the response time, which can become a bottleneck in real-time applications (e.g., voice assistants).

Q: How does DeepSeek optimize for long text and multi-round conversations?

Long text and multi-round conversations have always been one of the challenges for big models, and DeepSeek currently falls short in this area. However, it is worth noting that they proposed the NSA (Native Sparse Attention) mechanism, which is a more efficient way of handling these longer or multi-round conversations. This may optimize DeepSeek’s performance in such scenarios in future releases, making it more suitable for continuous conversations and large-scale text processing.

Q: Outside of DeepSeek, which models are you most optimistic about in China’s current open source AI ecosystem? Why?

I am particularly optimistic about the three open source models Qwen, OpenBMB MiniCPM-v and stepfun-ai. They are competitive in technology innovation, open source ecology, reasoning ability and landing scenarios. Qwen, for example, is an open source LLM launched by Alibaba, and its technical advantages are mainly reflected in the following ways:

  • Strong multi-task generalization ability: Qwen performs well in code generation, multi-language translation, text comprehension and other tasks, and is suitable for a variety of application scenarios.
  • Support for large-scale data training: AliCloud’s powerful arithmetic support makes Qwen rich in pre-training data and improves knowledge coverage.
  • Open-source community innovation: Qwen is highly scalable, and developers can fine-tune based on it for industry-customized AI.

Q: Why has there been a shift from LLMs to multimodal AI modeling?

Human thinking and cognition is multidimensional, relying not only on text, but also on visual, audio, video, and other stimulus. As a result, there has been a shift from text-only large models to multimodal models because multimodal AI is able to get closer to replicating the way humans understand.
I am particularly interested in MiniCPM-V, Step-Video and Step-Audio because they are the leading multimodal models in China’s open source ecosystem, and they have great potential for innovation in the direction of visual understanding, video processing and audio generation.
The open-sourcing of these models also means that China’s AI is exploring and making breakthroughs in the multimodal field, allowing more developers to participate.

Q: What are the advantages of MiniCPM-V over traditional visual macromodels?

MiniCPM-V is a visual extension of MiniCPM with core advantages over traditional visual models:

  • Lightweight design: Compared to the bulky Vision Transformer, MiniCPM-V is more suitable for low-computing power devices and can run on edge devices.
  • Multimodal capability: it not only understands pictures, but also combines text for multimodal reasoning, such as picture quizzes and graphic generation.
  • Efficient training: MiniCPM-V adopts an optimized vision transformer structure, which ensures high accuracy and higher computational efficiency, and is suitable for practical landing scenarios.

Q: What are Step-Video’s innovations in video AI?

Step-Video is an AI model designed for video understanding and generation, and its main innovations are:

  • Cross-frame understanding: While traditional vision models tend to deal with single images, Step-Video is able to analyze multiple consecutive frames to understand the dynamic information in a video.
  • Video Subtitle Generation and Description: It can automatically generate video subtitles, scene descriptions, and even produce plot analysis, which is helpful for automatic editing, film and television production, educational video analysis and other scenarios.
  • Generative capabilities: It not only understands video, but can also be used for Text-to-Video tasks, which has great potential in the field of content creation.

Q: How can Step-Audio improve AI’s ability to understand and generate audio?

Step-Audio is an audio-focused multimodal model whose key capabilities include:

  • Audio understanding: It can analyze speech emotion, music style, and environmental sound information, and is widely used in intelligent customer service, emotion analysis and other fields.
  • Speech synthesis: It supports high-quality Text-to-Speech (TTS) tasks to synthesize more natural and emotional speech for applications such as broadcasting and virtual assistants.
  • Music generation: Step-Audio can also generate music based on text or melody fragments for AI composition, game sound generation and more.

Q: What are the future application scenarios for MiniCPM-V, Step-Video and Step-Audio?

These three multimodal models have great potential for application in different fields:

  • MiniCPM-V: Suitable for image search, image Q&A, medical image analysis, AI design assistance and so on.
  • Step-Video: It can be used for intelligent editing, movie and TV assisted creation, automatic subtitle generation, video summarization.
  • Step-Audio: It has a wide range of applications including virtual anchors, AI voice assistants, music creation and sentiment analysis.

The fact that they are open source not only allows developers to use them freely, but also promotes the use of multimodal AI in practical applications, making AI smarter and move closer to the way humans perceive things.

Q: What are your thoughts on the future of AI?

Firstly, there will be a proliferation of personalized AI assistants, meaning that everyone will have their own AI to help manage work, study and life. Second, the cost of AI training and reasoning will fall, leading to AI becoming ubiquitous and no longer a technology exclusive to large companies.
And third, we should guide the development of AI with human sentiments and ideals, so that it can create a better life. AI should not dominate mankind, rather it should serve to better it. The future is full of challenges and opportunities, and AI will revolutionize the world, but the ultimate control should be in the hands of humans. We want to be leaders of the tech wave, not dominated by AI.

Enjoying what you’re reading?

Sign up to our monthly newsletter to get more China insights delivered to your inbox.

Our Programs

Scaling Innovation: AI and Digital Strategies for Business Transformation

Global Unicorn Program Series

This program is designed to equip senior executives with the strategic insights and tools necessary to lead in this transformative era.

LocationNew York, USA

DateJune 23-27, 2025

LanguageEnglish

Learn more

Global Unicorn Program: Scaling for Success in the Age of AI

Global Unicorn Program Series

In collaboration with the Stanford Center for Professional Development (SCPD), this CKGSB program equips entrepreneurs, intrapreneurs and key stakeholders with the tools, insights, and skills necessary to lead a new generation of unicorn companies.

LocationStanford University Campus, California, United States

DateSeptember 29 – October 3, 2025

LanguageEnglish

Learn more

Emerging Tech Management Week: Silicon Valley

Global Unicorn Program Series

Co-developed by CKGSB and UC Berkeley College of Engineering, this program equips participants with proven strategies, cutting-edge research, and the best-in-class advice to fuel innovation, seize emerging tech developments, and catalyse transformation within your organization.

LocationUC Berkeley

DateNovember 2-7, 2025

LanguageEnglish

Learn more

Asia Start: AI + Digital China Expedition

Asia Start is a 5-day immersive field learning program across Shanghai, Hangzhou, Guangzhou, and Shenzhen. It explores how AI, digital innovation, and Asia’s disruption model are shaping new market opportunities, offering a practical perspective on scaling businesses in the region.

LocationShanghai, Hangzhou, Guangzhou, Shenzhen

DateNovember 17-21, 2025

LanguageEnglish

Learn more

Smart Cities, Fintech, and Alternative Energy for the Global Future

Global Unicorn Program Series

This program, in partnership with Columbia Engineering, is a transformative initiative designed to empower civil leaders and businesses in smart city development, fintech, alternative energy, and new energy sectors.

LocationDubai, UAE

DateDecember 15-19, 2025

LanguageEnglish

Learn more

Disruption of Traditional Industries

Global Unicorn Program Series

The Global Unicorn Program in Disruption of Traditional Industries – presented jointly by CKGSB and University of Sydney – will emphasize Australia’s distinctive contributions.

LocationSydney, Australia

DateFebruary 22 - 26, 2026

LanguageEnglish

Learn more

AI-Driven Healthcare Innovation Program

Global Unicorn Program Series

The Artificial Intelligence (AI)-Driven Healthcare Innovation Program stands at the forefront of addressing the critical need for innovative healthcare solutions powered by artificial intelligence.

LocationJohns Hopkins University, Washington, D.C.

DateSummer, 2026

LanguageEnglish

Learn more