the first model is called mai-voice-1, designed for generating natural-sounding voices, while the second model, mai-1-preview, is classified as a text foundation model that was fully built and trained in-house.
according to the company’s statement, mai-voice-1 is capable of generating a full minute of audio in less than one second using only a single gpu.
the model is already being used in some “copilot” services, such as copilot daily, which provides a daily audio news summary, in addition to producing podcast-like discussions to explain topics. users can also try it out via the copilot labs platform, with the ability to adjust tone and speaking style.
as for the text model mai-1-preview, it was trained on about 15,000 nvidia h100 chips and is designed to handle text instructions and provide helpful responses to everyday queries.
microsoft emphasizes that this model offers a glimpse of what the company will deliver in the future within the “copilot” ecosystem. the company has already begun testing it through the lmarena platform for evaluating ai model performance, with gradual integration into some copilot services in the coming weeks.
in a press statement, mustafa suleyman, microsoft’s head of ai, said that the main goal of developing these models is not to focus on enterprises and corporations, but rather to deliver highly efficient experiences for individual users.
suleyman explained that the company seeks to maximize the value of its consumer and advertising data to build practical, reliable models that can serve as intelligent companions.
this step comes at a time when copilot services still rely heavily on openai technologies, a company microsoft has invested billions of dollars in. however, developing its own models reflects microsoft’s desire to become an independent competitor in the long term—even if that requires years of intensive investment.
suleyman confirmed that microsoft has “an ambitious five-year roadmap,” with ongoing quarterly investments.
microsoft believes that combining a set of specialized models to serve diverse needs and use cases will deliver tremendous value to users and pave the way for a new phase of competition in the global ai race.