(Bloomberg) -- Microsoft Corp.’s AI research team accidentally exposed a large cache of private data on the software development platform GitHub, according to new research from a cybersecurity firm.
A team at the cloud security company Wiz found the exposure of cloud-hosted data on the AI training platform via a misconfigured link. The data was leaked by Microsoft’s research team while publishing open-source training data on GitHub, according to Wiz.
Users of the repository were urged to download AI models from an cloud storage URL. But it was misconfigured to grant permissions on the entire storage account, and it also granted users full control permissions, as opposed to read only, meaning they could delete and overwrite existing files, according to a Wiz blog post. The exposed data included Microsoft employees’ personal computer backups, which contained passwords to Microsoft services, secret keys and more than 30,000 internal Microsoft Teams messages from 359 Microsoft employees, according to Wiz.
Open data sharing is a key component of AI training, but sharing larger amounts of data leaves companies exposed to larger risk if shared incorrectly, according to Wiz’s researchers. Wiz shared the data in June with Microsoft, which moved quickly to remove the exposed data, said Ami Luttwak, chief technology officer and co-founder of Wiz, who added that the incident “could have been worse.”
Asked for comment, a Microsoft spokesperson said, “We have confirmed that no customer data was exposed, and no other internal services were put at risk.”
In a blog post published Monday, Microsoft said it investigated and remediated an incident involving a Microsoft employee who shared a URL in a public GitHub repository to open-source AI learning models. Microsoft said the data exposed in the storage account included backups of two former employees’ workstation profiles and internal Microsoft Teams messages of these two employees with their colleagues.
The data cache was found by Wiz’s research team scanning the internet for misconfigured storage containers, part of its ongoing work on accidental exposure of cloud-hosted data, according to the blog.
©2023 Bloomberg L.P.