If you’re building an application that accepts user or third-party input, a crucial consideration for security is ensuring that you properly validate that input. You want to make sure that any data that enters your application is valid and secure. This data can be anything out of your control, such as comments submitted by users on your website. Input can also come from third-party web services that your application requests (like XML documents).
Malformatted data or carefully crafted input can be used by malicious actors to divert your application in order to inject unwanted commands (such as shell injections, or SQL injections), or just to make it crash. Input data is a major attack vector, and most of the vulnerabilities listed by the Open Web Application Security Project (OWASP) in their Top 10 list could be prevented by properly validating data before they are processed.
Failures or omissions in data validation can lead to data corruption or a security vulnerability. Data validation checks that data are valid, sensible, reasonable, and secure before they are processed. – OWASP
Validating input data in Python can be achieved in multiple ways. You can perform type checking, or check for valid/invalid values. The Python ecosystem provides several libraries to help with data validation that we will cover in this article. But first we will start with a straightforward example.
An example of improper validation
When we develop code, we often limit our thought process to writing for the happy path. We start with clarifying what the expected inputs and expected outputs are, and develop for those. We test them with
unittest and move on to the next piece of code.
Unfortunately, we don’t always get the expected inputs, whether by accident or maliciousness. And even the simplest piece of code can crash when handling unexpected input data. For the sake of this example, let’s say you want to provide a web service written in Python to compute the true division of two numbers.
Your code may look something like:
This service is designed for handling numbers, so it works as expected with
float types work as expected, there are values that won’t work. A well-known value is illegal in the operand of our function and will crash our application:
Our function is raising an unexpected exception. This is easy to fix by checking for this specific value and raising an exception that the caller will know:
However, this is not the only invalid input. Due to the nature of the Python programming language, variables are dynamically typed, meaning our function can be called with any argument type and type checks are made at runtime. As our input comes from the JSON body of a request, it is easy for a malicious user to change the type of
number_2 to whatever valid JSON types they choose. So, what about calling our function with non-numbers?
The service fails by raising another unexpected exception because the
float type does not know about
None. We can try to fix our function by checking for input types:
This is an improvement, but it is not sufficient. We can still trick the boundaries of the arguments in order to trigger a crash:
To properly validate this service, we should also limit the values we accept.
Compare this final version to the original we started with. Our function here at the end has almost nothing to do with the first version and is more complex, but it is required to expose our service to users in a safe way.
With more complicated services, it’s challenging to think through every improper input scenario. In practice, there are better solutions for validating input data than we’ve shown here. You can, for example, use the Python type annotations and a library to check the types before using the data.
Performing validation with types and Pydantic
Let’s take another example of a Flask web service returning articles stored in a MongoDB database. All articles are stored in the same collections, but some articles are drafts and we do not want to publish them for now.
The initial version of our web service looks like:
This endpoint forbids access to the
drafts category but forgets to validate the JSON request: it does not check for the input type but only for its value. What this means is that it can be tricked by a malicious actor with a JSON body containing a NoSQL injection. The following request will return all our articles including our drafts.
Starting with Python 3.5, it is possible to annotate our code with types and use external libraries to check them. This is what Pydantic does.
First, let’s specify a new type for our filter using Pydantic:
This class will be used by Pydantic to validate our input and make sure
category is a string, but it can be used to do a lot more. For example, it can be used for checking for invalid values. Our final version then looks like:
Due to these changes, we are now protected against NoSQL injections and only non-draft articles are published thanks to a proper input data validation.
There are several other Python libraries that can help you validate your input data:
- Django provides validation through Forms and Models, be sure to use them.
- Pydantic also supports
dataclassesstarting from Python 3.7.
- Cerberus is a clean and nice library that is input and validation format agnostic. Give it two dicts, it will raise if it fails.
Properly validating your inputs can go a long way towards improving the security of your code, and preventing third parties from accidentally or intentionally making your code do something it wasn’t supposed to. In this article, we took a look at some ways to validate inputs in Python. If you’d like to read more about good developer security best practices, check out our post on protecting against timing attacks.