About Sand.ai: Pioneering the Future of Artificial Intelligence

Our Vision: Shaping the Next Frontier of AI

At Sand.ai, we don't just embrace the future of AI—we shape it. As technological pioneers, we're driven by an unwavering commitment to push beyond conventional boundaries, transforming possibilities into realities.

Unlocking Breakthrough AI Solutions

Our vision transcends traditional AI development. We're on a bold mission to unlock the next frontier of artificial intelligence, channeling breakthrough research into solutions that challenge the status quo and redefine what's possible, particularly in image to video transformation and AI video extender capabilities.

Innovating with Magi

This spirit of innovation comes to life through Magi, our groundbreaking AI video generation model. While others follow conventional paths, Magi charts new territory with its revolutionary technology. By fusing Autoregressive modeling with diffusion technology, we've created something extraordinary—a system that brings real-time interaction and dynamic creativity to AI video generation.

Architecting the Future

The era of static, limited AI tools is over. Through relentless innovation and visionary thinking, Sand.ai is crafting the next chapter in artificial intelligence. We're not just developing technology; we're architecting the future.

Creating Tomorrow Today

At Sand.ai, the future is not something we wait for—it's something we create.

Our Mission: Advancing AI for All

Our mission is to advance AI to benefit everyone.

Our Team: A Powerhouse of Talent

Lean and Dynamic Team Structure

Lean team structure—less than 30 members, including product, marketing, engineer, and research teams.
High concentration of top young talent—core team average age under 30.

Leadership in Vision Model Research

Cao Yue, the CEO of Sand.ai，as the ex-head of the Vision Model Research Center at Beijing Academy of Artificial Intelligence, focuses on fundamental vision models and multimodal large models research. Over the past five years, he has extensively explored vision large models, proposing works like Swin Transformer and Video Swin in network architecture design, and SimMIM and EVA in pre-training methods. His work has received widespread attention in the field, with seven papers listed in PaperDigest's most influential papers, accumulating more than 50,000 Google Scholar citations. He has broken the monopoly of North American tech giants like Google and OpenAI in vision large models, achieving breakthrough progress in key visual perception tasks (such as ImageNet classification, COCO detection, ADE20K segmentation). His work has been widely applied in products like Microsoft Office 365, Azure Cognitive Service, TikTok, and Kuaishou.

Revolutionizing Vision with Swin Transformer

His co-first-authored paper, Swin Transformer, is the first general-purpose Transformer backbone network for visual recognition. It significantly outperformed the best convolutional neural networks across mainstream vision tasks (such as image classification, object detection, semantic segmentation). This work challenged the traditional belief that convolutional neural networks were essential for vision tasks, breaking their 30-plus year dominance in mainstream vision tasks. The paper won the Marr Prize (Best Paper Award) at ICCV, one of the highest honors in computer vision. It ranks among the top 5 most-cited AI papers in the past five years, accumulating over 30000 Google Scholar citations within just over a year of publication.