Analyzing corporate automated bug reporters in the Linux Kernel
Much of our world relies on open source projects - we frequently think of the below XKCD comic whenever we import a library for our own development projects. We may not know the developer of the project personally, but are so thankful that this individual took the time, care, and attention to build the software – and we’re equally glad that we did not have to write it.
The Linux kernel is arguably the crowning achievement of the open source movement - a longstanding, well-maintained open source project that is the foundation of Google’s Android and NASA’s satellite software. “No agendas. No politics. Just stuff developers need to build and maintain projects” is at the heart of the Linux Foundation’s mission.
In April 2021, LWN.net’s 5.12 development statistics caught our attention: five of the top ten bug reporters (presumably for bugs that got fixed) were automated systems - Kernel Test robot, Syzbot, Abaci Robot, Hulk Robot, and TOTE robot. What are these systems? Are they also open source? How cool would it be if open source security tooling was helping find bugs in one of the world’s most important open source systems?
Unfortunately, we’ve found that many of these systems:
- aren’t open-sourced or even well reported on;
- managed by corporate or defense industrial interests, and;
- may have concerning security implications moving forward.
We’ve done a deep dive below into each of these automated systems (as well as another automated contributors not mentioned above – Coverity), who they’re owned by, whether they’re open source, and what this may mean for the future of open source security. The TL;DR is in this handy table:
Why is this a problem?
Almost all of the companies in the above table sell software based on Linux, so it makes sense why they are contributing so heavily to the kernel. However, we have three main concerns:
- Keeping proprietary vendorship, agendas, and politics out of Linux’s open source community is impossible when software recommending changes to the Linux kernel is proprietary.
- Who watches the watchmen: to quote Linus’s law for open source, “given enough eyeballs, all bugs are shallow”. If the bug-finding tools for open source software are proprietary and only used by a few developers, how can the community check for similar bugs, reproduce findings, or learn from past mistakes? This is another example of Cathedral-style proprietary software development trying to creep into the valued bazaar approach of Linux kernel development.
- Corporate competition has an inverse effect on community security. Companies will usually fix bugs in their proprietary build first before contributing directly to the kernel. As the internet grows more closed and regional, large conglomerates may feel inclined to only fix their own proprietary versions, and not push security updates to the wider ecosystem.
After analyzing all seven automated systems (see below for the in-depth analysis), we see some clear cross-cutting issues:
1. Closed source / Little verification through community
Four out of the seven automated systems we mention above (Abaci, HULK, TOTE, and Coverity) are indeed not open sourced, which directly plays to our second above concern - much of the community will not be able to check for similar bugs or reproduce findings from some of these tools. On top of this, 3 systems have small percentages of non-employees using the tools to find bugs - suggesting that these tools are fairly close held. The exception to this is Coverity - while its tool is not open sourced, any open source community member may use the tool for free.
2. Overloaded maintainers
The email/commit ratio for some of these tools is very high (100093 LMKL emails to 137 patches!), and some of the commits are for almost irrelevant “noisy” bugs. This type of noise bogs down maintainer time, as reviewing changes take much longer than the minor change itself. Given how overstretched maintainers are, this is a time suck away from more important security or maintenance issues within the kernel.
3. Corporate interests may outweigh open source interests
The open source community, especially within the Linux kernel, relies on trust relationships. However, some of the above corporate driven automated systems are overloading maintainers with low priority bugs from closed-source software – which is not an ideal way of maintaining that trust relationship. Most concerning of the seven is Huawei: the company has been linked to spying on behalf of the Chinese government, its HULK robot is closed-source and contributes to large numbers of “noisy” bugs, and one of its employees was accused of placing vulnerable code in the Linux kernel in 2020.
Some Potential Solutions / Conclusion
Although the automated systems are concerning, they do contribute necessary work to the kernel. So what can be done to help mitigate some of the harm they’re doing? We have four recommendations:
1: More collaboration between the open source community and security researchers.
The social dynamics behind kernel development has been fairly resistant and frustrating for outside researchers looking to help. Admittedly, the security community has also been frustrating and occasionally disrespectful to maintainers’ time and trust. We would love to know more ways that the security community can plug in their research in a way that is as light touch to maintainers as possible. However, this light touch approach must also fit with security researchers’ work flow, and is as transparent to both communities as possible. Creating a more robust relationship between individual researchers and maintainers would allow for more in depth, sustainable research into issues plaguing the kernel (i.e. hypocrite commits, small inconsequential changes (aka KPI-grabbing bugs), flooded email lists) that also understands maintainer incentives and priorities.
2: Pressure companies to open source internal Linux kernel bug-reporting tools.
Companies that are benefiting enough from open source to be searching for bugs within the Linux kernel can also benefit from open sourcing their tooling. Alibaba and Tsinghua University in particular are well known within open source communities within China and internationally - allow security researchers and open source devs to make your tooling even better!
3: Conduct regular file comparison between proprietary releases and the kernel.
“Diff”-ing is a common occurrence in version control - what makes this release different from the last? At the very least, if companies refuse to open source their internal tooling, conducting file comparisons between corporate versions of Linux and the kernel may be good ways of finding bugs that may not have yet been reported openly.
4: Encourage corporate donations to the Linux Foundation for maintainers specifically focused in their “community”.
Because maintainers are overextended, one possible solution is for the Linux Foundation to solicit donations from corporate entities in order to have additional maintainers. The work these individuals do for the kernel cannot be understated and this key job must be supported.
We hope to be doing more digging into Linux kernel security soon, so stay tuned for more blog posts!
The Bot Lineup
Syzbot - Google
Open Source: Yes!
Pros: hugely beneficial for research!
Cons: overloading devs with bug reports
Syzkaller/Syzbot is a public system call coverage-guided fuzzer that uses mutations on grammar to attempt to find crashes within the Linux kernel. It was initially developed at Google and continues to be a mainstay of kernel development. The Syzbot dashboard tracks fixed, unfixed, and invalid bugs, offering a helpful view into the false positive rate.
The open nature of Syzkaller has enabled numerous ongoing research projects into further finding, exploiting, and understanding large-scale code flaw databases. Brad Spengler (@spendergrsec), from GRSecurity, points out that “In the past 10 years, the adoption of syzkaller has had possibly the largest impact on upstream kernel development”. However, he goes on to point out that the number of bugs found is swamping the Linux Kernel development team, and that automated exploitation efforts have been effective in taking advantage of the results of Syzkaller.
Linux Kernel Test Robot - Intel
Open Source: Yes!
Pros: Testing is important
Cons: Email overload, not that many patches.
This project from Intel is open sourced and well documented (but confusingly called “0-day”, which is a term that infosec communities use for *undetected* vulnerabilities). It is described on their project page as “an automated Linux kernel test service… [that] performs build, boot, functional, performance, and power tests whenever it detects changes”.
Based on our research, the “0-day” robot has authored 137 patches but sent 10093 emails to the Linux Kernel Mailing List. As seen below, these patches are often “cleanups” and are spread throughout the entire code base. This activity is important for kernel stability - as testing is often an exponentially difficult task.
Pros:Doing a wide variety of cleanup!
Cons:Potentially originated from China's defense industrial base
Abaci Robot <[email protected]> does not have much public information available other than maintainer shifts. It appears to have been first built by Heyuan Shi. As seen in the screenshot below, this particular automated system has committed to large files (seen as pink nodes). Based on the star pattern below, Abaci’s bug reports likely involve cleanup or security work that touches many code areas.
Not much is written about Abaci in the open source, but it is possible that Abaci is based on Syzkaller (mentioned above), based on a 2019 paper Heyuan Shi wrote with Alibaba and Tsinghua Wingtecher Lab researchers. The paper itself was written when a majority of the authors were at the Beijing National Research Center for Information Science and Technology (BNRist) - one of six national research and development centers approved by the Ministry of Science and Technology in November 2017. The goal of the BNRist is to “serve the country’s “Belt and Road” global strategy, network information security, social and economic transformation, and other major needs”.
HULK Robot - Huawei
Open Source: No!
Pros: Huge number of patches over time.
Cons: Huawei is a sanctioned entity linked to Chinese espionage
The most concerning robot in this list is HULK Robot (or Huawei Unified Linux Kernel Robot). According to interviews with Wei Yongjun, a principal engineer at Huawei Cloud and HULK robot’s creator, Huawei “has spent a lot of time and energy in the maintenance of the kernel because of the development of the openEuler distribution”. Huawei has notably been focusing more energy on openEuler and their mobile OS, Harmony OS, since Huawei was put on the U.S. entities list for threatening U.S. national security, preventing U.S. companies (ranging from chip manufacturers to Google) from selling their services to the company. Huawei has been accused of industrial espionage against T-Mobile, was linked to a 2012 cyber-espionage campaign in Australia, and violated US sanctions by selling telecommunications equipment to Iran.
HULK Robot’s reporting credits were three times that of syzbot’s in 2021, dwarfing other reporting contributions. The bot itself posts under Yue Hai Bing ([email protected]) and has 1689 patches in our data set. While bug fixes in the Linux kernel is great, this is not without some concerns: for one, some of these changes are not worth the effort: some Huawei engineers using HULK have been accused of creating numerous “KPI-grabbing” changes in 2021 – changes that take more time for the maintainers of the kernel to review than for an engineer to commit the change.
Also, Huawei doesn’t have a spotless record in the Linux kernel: a Huawei engineer has already been accused of placing vulnerable code in the Linux kernel in 2020 (not associated with HULK). On top of that, Huawei has a history of following Chinese government interests - a government that wants vulnerabilities in open source software to be reported internally prior to any external action. Presuming that Huawei first runs HULK bot and other tools on its internal openEuler system before patching the original open-source Linux kernel, it is possible that not all vulnerabilities found in the Linux kernel by Huawei are being reported to the Linux community.
TOTE Robot - Tsinghua University
Open Source: No!
Pros: Maintained by a part of Tsinghua really into sharing with the wider community!
Cons: Tsinghua has plenty of sister departments that aren’t nearly as into sharing…
TOTE Robot is reported less than many others (39 commits in the data set reference it) - and focuses on uninitialized variables, NULL pointer dereferences, and incorrect return codes from functions. This may be because many of the bugs it finds overlap with other automated source code analysis methods, but the ones it finds that other tools such as Coverity do not are still interesting. While it is not apparent what technology is behind the TOTE Robot (the researchers involved do both Static and Dynamic Application Security Testing), this is likely a static tool.
The primary authors committing to the Linux kernel that mention the TOTE Robot (seen above) are Bai Jiaju and Tuo Li, two researchers at the Tsinghua University Operating Systems Lab (Tsinghua OSLab). OSLab focuses on the security and reliability research of system software, and its specific research directions include defect detection, vulnerability mining, and kernel program analysis. In the last three years, lab faculty members have published at USENIX conferences, and contribute widely to the international developer community in both Chinese and English. However, OSLab’s sister institute under Tsinghua’s Computer Science Department has conducted Linux operating system research likely for more offensive purposes - the institute receives 863 key project funding (which funds Chinese security initiatives) and is headed by faculty formerly from the National University of Defense Technology, a key military academic institution.
Coverity - Synopsys
Open Source: No!
Pros: OG source code analyser. Used by a large number of community members
Cons: Less likely to find race conditions or logic vulnerabilities. Funded originally by DHS.
Coverity is a C source code analyzer (SAST) owned by Synopsys that has been used for many years (since 2006 - funded by the US Department of Homeland Security) to do code cleanups and security analysis on large open source projects, including the Linux Kernel. The tool focuses on dead code detection, buffer overflows, integer overflows, and other common C programming mistakes. However, as a SAST tool, it is less likely to find race conditions or logic vulnerabilities that a fuzzer like Syzkaller might find. The screenshot below shows the number of people involved with Coverity-inspired fixes across the codebase. In our data set, 3075 patches mention Coverity.
Smatch - Oracle
Open Source: Yes!
Pros: Static analysis that does really well on one-off errors
Cons: Due to high FP rate, recommended to focus on new warnings from newly added code
We didn’t mention this in the original table, but Smatch is a static analysis toolkit written by Dan Carpenter of Oracle (note that Dan is the 4th most active bug reporter from 5.12, behind kernel test robot, Syzbot, and Abaci). Smatch works by tracking values within variables using a flow analysis engine and is designed to be highly extensible by security researchers. Smatch has been around since 2009 and in its first 6 years was responsible for patching 6000 security vulnerabilities. While we are unable to find too many commits in our dataset, It is possible that there are even more, as Dan Carpenter is also a highly tagged reporter in Linux kernel bug fixes.
About Margin Research
Margin Research is a boutique security research firm located in NYC. Our team has extensive experience in the fields of automated program analysis, vulnerability discovery, behavioral analytics, and inauthentic behavior discovery. This breadth of knowledge and experience is constantly expanding and builds a foundation upon which we are able to tackle the industry’s toughest challenges.