On 13 January at approximately 11am, the UpGuard Data Leaks detection engine identified a GitHub repository with potentially sensitive information that had been uploaded half an hour earlier. Shortly after noon an analyst began reviewing the contents of the repository. After assessing the contents to establish the scope of the data, its degree of sensitivity, and the identity of the owner, the analyst notified AWS Security at 1:18pm. By 4pm, the repository was no longer publicly accessible, and at 4:45pm AWS Security replied to the initial notification email saying that they had taken action.
System Data and Credentials.
When downloaded from GitHub as a compressed .zip file, the storage size of the repository totalled 954 MB. The repository was structured as general storage rather than application code, with many files in the top level directory and no clear convention for the sub-directories. Consistent with the engineer’s role, there were many AWS resource templates and log files, some of which included enough mentions of hostnames to identify likely AWS customers being assisted by the engineer. Timestamps in the logs indicate they were generated throughout the second half of 2019.
Of greater concern, however, were the many credentials found in the repository. Several documents contained access keys for various cloud services. There were multiple AWS key pairs including one named “rootkey.csv,” suggesting it provided root access to the user’s AWS account. Other files contained collections of auth tokens and API keys for third party providers. One such file for an insurance company included keys for messaging and email providers. The risk for committing these credentials would be mitigated over time due to GitHub’s token scanning feature, which identifies tokens matching certain patterns, but how quickly they are revoked is unknown. What we do know is that third parties can detect such credentials on GitHub within minutes.
Other credential types that would not be revoked by token scanning included private keys and passwords. Unlike AWS key pairs or other credentials subject to GitHub token scanning, these cannot be deterministically mapped to an issuer for automatic revocation. While some of the private keys were clearly labeled as “mock” or “test,” others were not, and included words like “kube,” “admin,” and “cloud” that could indicate association with more privileged systems. The passwords were associated with databases hosted in AWS and mail servers. UpGuard never attempts to use credentials, even when stored on the public internet, and cannot determine what data they may have been able to access.
In addition to data related to computer systems like credentials, logs, and code, the repo also contained assorted documents that established the identity of the owner and their relationship to AWS. These documents included bank statements, correspondence with AWS customers, and identity documents including a drivers license. Multiple documents included the owner’s full name. A LinkedIn profile matching the exact full name identified one person who listed AWS as their employer in a role that matched the kinds of data found in the repository. Other documents in the repository included training for AWS personnel and documents marked as “Amazon Confidential.” Based on this evidence, UpGuard is confident the data originated from an AWS engineer.
Amazon Web Services is the largest provider of public cloud services, laying claim to about half the market share. In 2019, a former Amazon employee allegedly stole over a hundred million credit applications from Capital One, illustrating the scale of potential data loss associated with insider threats at such a large and central data processor. In this case, there is no evidence that the user acted maliciously or that any personal data for end users was affected, in part because it was detected by UpGuard and remediated by AWS so quickly. Rather, this case illustrates the value of rapid data leaks detection to prevent small accidents from becoming larger incidents.