In June, I was personally contacted by a security researcher who had discovered a vulnerability in one component of the Sqreen PHP agent. This vulnerability would allow a bad actor to execute injected code through network access to the Sqreen PHP daemon.
The data, accounts, or other sensitive information of vulnerable customers weren’t compromised or accessed because of this vulnerability. And the confidentiality and integrity of the Sqreen backend, servers, and data haven’t been affected.
The identified issue does not affect PHP agents 1.16.0 and later (released on or after April 2020) as this issue was corrected before we were aware of the problem, as part of a separate, non-related, engineering improvement. This issue does not affect any other agents.
This vulnerability led us to create 2 CVEs to reference the issues:
- CVE-2020-25489 for the PyMiniRacer heap overflow;
- CVE-2020-25490 for the lack of signature verification
We have now communicated to all of our impacted customers, and we wanted to take the opportunity to share this and describe how we remediated it. We know it might not be standard practice to be so open, but we believe in transparency at Sqreen, and hopefully these learnings can be helpful for others as well.
Let’s start by taking a look at how the PHP agent works, and at the security mechanisms that are in place.
How does the Sqreen PHP agent work?
The PHP agent has two main components: A PHP extension, that performs all the dynamic instrumentation of the PHP processes, and a PHP daemon, that holds the state towards our API, and receives information from the extensions.
Since PHP productions are often configured on a single machine running several different applications (using FPM pools), we needed the daemon to be stateless, and its configuration driven by the FPM pools. Hence, the daemon is listening for connections coming from the pools, and get its API token and other configuration directives from there.
Then, our daemon uses this token to fetch its Sqreen configuration from our backend. This includes the latest RASP protections, In-App WAF configuration and custom rules, allowlist and denylist for IP addresses and users, etc.
Since RASP protections are active and running in a virtual machine, we sign them offline with a private key. The agent checks the signature with its public key, and only runs them if the signature is correct. This helps us ensure that what’s entering into the virtual machine is safe, so we can be confident that no malicious RASP code will ever block legitimate requests, or try to abuse the virtual machine.
Our 6 agents (soon to be 7!) follow the same architecture, though only PHP has a daemon.
How did this vulnerability come about?
Now, let’s look at how this design (great on paper) actually had a few implementation gaps, that, while not necessarily harmful on their own, could cause harm if all leveraged together.
- First, since the configuration is pushed from the extension to the daemon, the backend URL can be defined by the extension. Hence, an attacker could specify their own server URL.
- Secondly, the rule signature verification wasn’t always properly enforced in the daemon. Hence, the attacker could inject their own rules, and have them executed in the virtual machine.
- Last but not least, a part of our virtual machine had a flaw that allowed an attacker controlling their input and outputs to execute arbitrary code.
Those 3 gaps, strung together by a skilled attacker, would allow them to execute code in the Sqreen daemon process. This daemon isn’t running as a privileged process, but still has some access to the app’s traffic.
As a former offensive security researcher, the mission of building Sqreen has always been taken with the knowledge that things can fail, and that security has to be built into every layer of the product. This is what we are committed to doing, what we recommend to our customers, and what we will keep doing. As our team grows, as we push more development at a faster pace, the complexity of what we’re pushing is also growing. Complexity is the enemy of security, so other security mechanisms are needed. This is what we are describing in the rest of the blog post.
How did we fix it?
As mentioned earlier, we had already addressed some of these gaps as part of earlier engineering efforts, while a few issues still needed attention at the time the vulnerability was shared with us. Now, let me walk you through what we did to address each of these gaps, both before and after the we learned of the vulnerability.
- Despite the fact that point 1 isn’t a security issue by itself, it can be a gateway to other issues, so we decided to restrict this functionality to only trusted Sqreen hosts. As a measure of defense-in-depth, we restricted where the daemon could connect to, and only allowed it to connect to domains ending with sqreen.com or sqreen.io. As such, we still allow our customers to use custom domains (e.g. custom-tenant-1-us.tenants.sqreen.com). This fixes the first element we identified.
- The main security mechanism that failed here is the rule verification. This one was already remediated before the vulnerability was reported, as part of a broader improvement of the PHP agent. However, we wanted to be positive that this issue wouldn’t appear again. As such, we performed an end-to-end check with each of our agents to ensure that they are enforcing this cryptographic signature verification. As we are also separately automating the testing of Sqreen agents, these tests will be automated and performed at every code modification soon as well:
- To tackle the third element, we fixed the flaw in our virtual machine bridge. To ensure safe arbitrary JS <> Native conversion, we are now using a much more constrained, static, JSON-based format in order to exchange data between the virtual machine and the external world.
These security mechanisms together consistently close each issue that was reported previously.
How did we communicate with our customers?
Once we fixed the vulnerability in depth, I contacted our customers to ensure that they had full visibility and could make updates to their agent version if needed. We focused on making sure they had all the info as quickly and transparently as we could.
The next step has been writing this blog post, applying for a CVE number, and working with the security researcher who responsibly disclosed this vulnerability to have him share the extensive research he did to successfully exploit this vulnerability.
What are we changing in the long term?
Making these modifications helped us address the identified vulnerability. The next step is to ensure that our agents remain secure into the future, as they are rapidly evolving software.
This experience gave us the opportunity to create a more formal mapping of our attack surface and threat analysis around our agents. We’ll deliberately investigate and improve areas where there are opportunities to do so, and will push to reach the state of the art and beyond.
The other approach that was game changing in this vulnerability discovery was the offensive point of view of the security researcher. We will keep contracting on a regular basis with security researchers that have a proven track record of low-level security investigation.
But we also need to have more continuous offensive watch over our agents, as they are at the core of our product. To assist with this, we’re proud to be opening a public bug bounty, whose scope will start with the agents, and will steadily increase to cover other components of the Sqreen platform. Stay tuned for more information as we roll it out!