Earlier today, we introduced a number of very cool features we just released. You can read more about the major items we introduced in our blog post about the launch. In this post, I want to shine some light on one feature in particular: the In-App WAF. I’ll share how we built it, what the process looked like, some of the tradeoffs we made, and more. Long read ahead!
What is an In-App WAF?
Our goal at Sqreen is to make security more accessible, and that means giving security teams, operations teams, and developers more visibility and better protection. In-App WAF is our latest addition to the protection side. In-App WAF is a web application firewall (WAF) that sits inside your application, rather than at the network level.
Historically, a WAF was a box put in-between your application server and your users. This box intercepts any HTTP connection, parses it, and runs patterns or simple logic to determine if it’s malicious. Over the years, that box has moved to software, but still often requires a dedicated server.
Although WAFs have been a workhorse of the web application security world for a few decades, they have quite a few downsides that limit their deployment in modern environments:
- They trigger many false positives;
- They require manual tuning to be relevant to your stack;
- They often introduce a significant performance overhead;
- They have limited operating options besides either blocking traffic or letting it through, with most requiring a one-size-fits-all configuration.
These downsides are not a result of WAF’s approach to security (pattern matching traffic against a ruleset), but rather primarily rooted in the disconnect between the WAF context (HTTP request, patterns) and the application. This disconnect can be bridged to some extent by extensive trials and configuration, but at some point, the work gets prohibitive (for instance, having to configure rules per-path, but with regex instead of your applications’ flow).
When sitting down and trying to imagine what a Sqreen WAF would look like, we realized we had a golden opportunity to improve upon WAF’s standard implementation. Specifically, we realized we could leverage Sqreen’s position inside the application to directly address WAF’s downsides while maintaining the spirit of protection that WAFs have always had.
Sqreen’s In-App WAF lives inside your application. That means that it can:
- Rely on the fallback of having Sqreen’s RASP protection in place to be a bit more conservative in the default rule tuning. If a pattern is incorrectly detected despite the conservative approach, we can disable this pattern only on very specific routes for “free”;
- Automagically configure what patterns it deploys to be tailored to your stack, as we’re deeply embedded inside it and know the context of your environment;
- Start with the performance advantage of piggy-backing on your application decoding the HTTP request, which lets us deliver roughly 2ms of overhead on default patterns (less than half of industry standards measured by Loggly)
Because of where it sits, Sqreen’s In-App WAF gets a significant boost in value and usefulness right out of the door, all for very little work on our users’ end.
As one of the engineers responsible for creating the In-App WAF (specifically, the core component that we’re now sharing across the agents that power the feature), I wanted to share a technical look into some of what went into building the core component of this feature.
What a good In-App WAF needs to do
The first step to creating a complex new feature is to plan ahead, so that you know roughly what each component needs to achieve. It’s also a good moment to air out any engineering concerns there may be so you don’t waste time later on. In this post, I’m going to focus on the part of the In-App WAF running in our agents, i.e. the component that needs to run the WAF logic.
- We have limited performance breathing room on our slowest runtimes, and the additional work was expected to have a prohibitive impact;
- We had significant concerns about ReDOS, as we need to run semi-arbitrary (custom) regex on untrusted input. As we were working on the feature, CloudFlare experienced this very issue, which we took as validation of our concerns;
- There are compatibility issues between the regex engines available in all of our languages. We want our users to be able to import/write their own custom rules without having to be concerned about their specific application’s technology.
The main alternative to V8 at this point was to bite the bullet and start moving towards a compiled, high-performance dynamic library loaded by our agents to offload compute-heavy work. We’d been considering doing that for a while, but up until recently, the need never justified the engineering work required.
As we narrowed in on this option, we enumerated what such a library would need to have:
- Absolute stability;
- Portable, fast, ReDOS resistant regex engine;
- Ease of install;
- Predictable, high performance;
- Easy compatibility across systems and languages;
- Easy extensibility
Unfortunately, those are simply the ground rules to even consider going after this approach! On top of that, the In-App WAF we wanted to build needed to do much more. For instance, it needed to:
- Be compatible with CRS. We don’t use the CRS format directly, but wanted to be able to easily import CRS rules for better familiarity and easier portability;
- Have very flexible control flow primitives. We wanted to be able to build advanced logic based on the WAF primitive;
- Be easily extendable. The more we could use the WAF primitives for other WAF-y purposes, the faster those will be and the less code we had to ship;
- Be timing aware. Every call to the In-App WAF should be able to specify a maximum time duration that the processing is allowed to take, and the library should abort as close to this limit as possible. This makes sure any DoS attempt is stopped before things get out of hand.
With this roadmap clarified, we made the call and went on to actually build the thing.
Building libSqreen.so (carefully)
We decided to build our own library, which we called libSqreen, so we needed to answer some key questions that would drastically shape the end result. The first question was to figure out which regex engine to use. The two main contenders were Intel’s Hyperscan and Google’s RE2, since both are DFAs, making them resilient to ReDOS, and both are strong on the performance front.
We put both engines through a series of tests. At the end of the day, Hyperscan was a bit faster, but harder to initialize, significantly larger on disk (> 5MB), and we had doubts about its future compatibility (regarding support for ARM servers). As such, we decided to go with RE2.
With our regex engine chosen, the next question was which language to pick to write the library. We bounced around Go, Rust, and C++, and after some back and forth, we decided to be conservative with our language choice, and went with C++.
For Sqreen’s Agent team, the #1 rule is simple: Never. Crash. A. Customer. By running at the core of our customers’ applications, we’re fully aware of our responsibility and avoiding crashes is our top priority. With that in mind, the decision to go with a language that isn’t memory safe was a serious concern. C++ pitfalls are well known, and we tried to avoid them by sticking to a limited subset of C++14, a very conservative design with battle-tested libraries.
By combining a robust test suite, tooling, and very aggressive pre-release testing, we managed to build confidence in the library.
Now that we had chosen the language and the regex engine, it was time to simply do everything else.
The last big architectural requirement was that the library (and the In-App WAF in general) had to be thread-safe and really fast. This is usually a conflicting set of needs, as high performance goals often push you to get cute and take risks. We circumvented this class of problems by instantiating the WAF rules in an immutable monolithic object and leaving the ownership work to std::shared_ptr.
This let us get away with a very small critical section (we only hold the mutex for a few operations as we read/edit the global WAF object store) and let us avoid the need to hold back rule updates while the In-App WAF is running.
At the end of the day, we have a couple of very simple C APIs that we use to initialize, use, and destroy an In-App WAF ruleset, plus a couple of utils to transform the user parameters. We decided to use a C API instead of the C++ we use internally as a way to simplify bindings with all of our agents technologies. C++ can sometimes result in quite weird name mangling and we didn’t want compatibility issues if we could help it.
The major challenges we ran into
When it comes to building a complex feature like In-App WAF, there will always be challenges. One of the early challenges with shipping C++ code is that the C++ standard library isn’t always available on our users’ systems, and making our installation process accessible is a key focus for Sqreen. We solved this issue by statically linking with the standard library: basically copying everything we need inside the library that we ship so it’s fully self-contained.
A second challenge that we ran into may sound easy, but it’s surprisingly deep: how do you analyze a request when the definition of a “request” changes from one agent technology to the other?
The In-App WAF is, by design, isolated from the runtime context of the application. As such, the In-App WAF and the agent need a way to request and send the needed data, respectively. The requested data could be more than just strings (i.e. maps, arrays, numbers), so this added additional complexity. Solving those two problems was something we had to do separately.
When our backend sends the In-App WAF rules that need to be run to the agent, it also sends a precomputed list of data sources that the In-App WAF will need. Those data sources are referenced using a syntax standardized across our other security modules. This syntax is a bit different from agent to agent, but enables the same feature so the backend can easily compile the accessors in each agent’s syntax. When calling the In-App WAF thus, the agent already knows the data it needs to send, and the infrastructure it needs to pre-compute them.
When it comes to transferring this data through the C interface, we decided to always give ownership of the memory to libSqreen: the agents create an “In-App WAF” data structure and then call various utils to make it represent the requested parameters. They then call the “runIAWAF” API with this structure to be processed. This approach helps us make sure everything is standardized across the agents and that we can concentrate our audits and test vectors on a smaller target.
Finally, when qualifying the In-App WAF, we discovered an area of concern we hadn’t really anticipated. Each regex the In-App WAF needs to run requires creating an RE2 object and storing it in the rule object. Unfortunately, those objects are quite memory-inefficient and in aggregate can take multiple megabytes of RAM per WAF instance. Although this is a limited concern by itself, some of our customers have a lot of runtime instances, and thus would see this overhead multiplied significantly up to concerning levels. We managed to mitigate this issue by sharing those objects across multiple instances and replacing some regexes by other operators.
In-App WAF today and tomorrow
The In-App WAF we released today applies the principles of WAFs to an in-application environment to address some of the classic limitations of WAFs while maintaining the benefits of the approach. It has full support for our control flow and supports a handful of operators, the most important being the ability to run regex or libinjection on input. In the next few weeks, we’re planning to add more with the ultimate goal of supporting everything in CRS.
The real next step though is to break up the monolithic processing we inherited from traditional WAFs and run patterns much more opportunistically. After all, why pay a latency penalty to run SQL rules when the route doesn’t perform any SQL requests? We can run the SQL patterns only when you do your first SQL query, or even asynchronously afterward, making it virtually free latency-wise while still giving you visibility. We have more things in mind (such as making the In-App WAF and the Sqreen RASP protection communicate) but that is a tale for another post. 🙂
 It required us to precompute a graph or wait more than one second at initialization
 Of all the fuzzing we ran on it, the only issue we found was a UTF-8 truncation issue that didn’t affect core functionality, but caused issues with reporting