XML External Entity (XXE), explained

Web application security has gained a lot of recent interest. The quality and skills of hackers have improved over time. So it’s important for the defenders of an application to strengthen its protections and increase their visibility. Part of  doing this is to stay informed about common vulnerabilities. Every year OWASP puts out a list of the top 10 web application security risks. One of these top risks is the XML External Entity vulnerability, aka XXE. So, in this blog, I’ll explain what XXE is and how you can protect your application from this risk. But before understanding the vulnerability, let’s catch up with the basics.

Understanding XML External Entity

What is XML?

When you have a web application, there’s an exchange of data. One of the languages that can be used for this exchange of data is Extensible Markup Language (XML). XML focuses on storing and organizing data rather than focusing on how to display data on the application.

Data in XML is stored in tags and usually in a parent-child structure. Look at the following example:

    <text>Hi Bob</text>

Here, the message tag is the parent tag, which encloses the data.  The message tag has to, from, and text as its child tags, where the actual value is stored. This is how a typical XML markup looks.

This works fine when you’re exchanging fixed data. But consider a situation where you want to exchange dynamic data. For example, say your application has to fetch the user’s operating system details from their system. You’ll have to get this information from the user’s remote system. In such cases, you can use XML External Entities.

What is an XML External Entity?

XML has a feature called Document Type Definition (DTD), which has the information about the structure of the XML document. The elements within these DTD documents are called entities. Entities are defined inside the DOCTYPE header, and they’re mainly used to fetch data from remote external systems. They’re similar to macro definitions.

Entities can be internally defined where the value is assigned in the same code. A typical XML entity structure looks like this:

<!DOCTYPE foo [
    <!ELEMENT foo ANY>
    <!ENTITY bar "World">
<foo>Hello &bar;</foo>

The output of <foo>Hello &bar;</foo> would be Hello World because the entity bar is defined as the string “World”.

In another case, entities can be used to get information from an external source. This is done by the SYSTEM keyword. Let’s say you have a file named user.txt somewhere in a system that stores the username (ex: Bob).

<!DOCTYPE foo [
    <!ELEMENT foo ANY>
    <!ENTITY name SYSTEM "file:///home/Desktop/user.txt">
<foo>Hello &name;</foo>

This code would fetch the data from the user.txt file and use that value. So the output would be Hello Bob. Though this feature comes in handy in a lot of cases, it also introduces vulnerabilities if not handled securely.

What are XXE vulnerabilities?

XXE vulnerabilities exist in web applications that use XML for data exchange. We’ve seen a few examples of XML structure and how data is stored. Suppose a web application is using XML data; if it is, an attacker can interfere with the request and manipulate it. Attackers can inject malicious code in XML, similar to SQL injection or command injection, to obtain the desired results.

Let’s understand how it works with the help of an example. Suppose there’s a web application that gives you movie details.

The XML code to transmit this data could look something like this:

    <title>Iron Man</title>
    <star>Robert Downey Jr.</star>

Considering that XML is used for data transfer and not for displaying it, I can get this data from XML code and display it in a custom way. For example, the output of the web page could be as follows:

Iron Man starring Robert Downey Jr. was released in 2008.

This is the ideal case where the web application is working as expected. But there’s a chance that an attacker can intercept the request and modify it.

XXE injection

Let’s consider the same example of a movie application. The attacker can capture data being sent to the server and inject malicious code in it. And using XML External Entities, they can also get sensitive information.

Let’s say a hacker has learned that the web server is running on a Linux operating system during the information gathering phase. And using a monitoring tool, they’ve learned that the web application uses XML to exchange data.

We know that Linux stores user password hashes in the /etc/shadow file. So the hacker can modify the request and fetch the details of this file using the XML External Entity feature. The malicious request would look something like this:

<!DOCTYPE foo [
<!ENTITY inject SYSTEM "file:///etc/shadow" >]>
    <title>Iron Man</title>
    <star>Robert Downey Jr.</star>

When this request is sent to the server, it’ll fetch the movie and star details as usual. But instead of the release year, it would fetch the contents of the /etc/shadow file.

The output would look something like this:

Iron Man starring Robert Downey Jr. was released in

Now that’s a very delicate piece of information. And if the hacker gets it, they can crack the password and take over the account.

Impact of XXE vulnerabilities

XXE vulnerabilities are very dangerous because they give the hacker access to the backend system. What exactly a hacker can do by exploiting this vulnerability varies based on use case and configuration. Let’s look at a few scenarios.

Unauthorized access

Hackers can use XXE injections to get access to system files that can hold sensitive information such as passwords, application details, security configuration, etc. And once the hacker gets access to the server, they can also change the configuration. For example, they can disable the firewall on the server, which would help them launch other attacks.

Malicious file upload

If the server allows a file upload feature, the hacker can upload a malicious file on the server. This file can then be used as a backdoor or a keylogger that would allow the hacker to create more damage to the web application.

Denial-of-Service (DoS) attack

A hacker can request a large bandwidth from the server by expanding the XML External Entity feature, causing a DoS attack.

!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!ENTITY lol1000 "&lol999;&lol999;&lol999;&lol999;&lol999;&lol999;&lol999;&lol999;&lol999;&lol999;">

Such a request would take a huge amount of bandwidth of the parser to expand &lol1000, because the expansion goes all the way from lol999 to lol1. This type of DoS attack is also known as the billion laughs attack.

Now that you’ve understood all the aspects of a typical XXE vulnerability, let’s see how you can prevent them.

Preventing XXE vulnerabilities

As a person responsible for the security of the web application, you have to take measures to prevent vulnerabilities. Creating a foolproof defense is specific to each use case. But here are some of the general approaches:

  • The main entry point for XXE injections is DTDs. The easiest way to prevent XXE vulnerability is by disabling DTDs for your application.
  • Disabling DTDs might not be suitable for all scenarios. If that’s your case and you don’t want to do it, you can use input validation to prevent malicious injections.
  • You can configure the default features of XML and its parser that you don’t need—for example, XInclude.
  • Disable the resolution of external entities.

You can find an exhaustive list of prevention methods here.

Wrapping it up

If you have a web application that uses XML, then it’s important for you to secure it. Considering what damage XXE vulnerabilities can do to your server, you can’t take a chance. The prevention cheat sheet above would definitely help you set up the basic fence. 

However, security is strongest when you take a defense in depth approach. You’ll want to identify more ways an attack can happen and the weak points of your app, via point-in-time snapshots like pentests and finding ways to gain more visibility on an ongoing basis with ASM tools like Sqreen. Get a Sqreen demo here.

This post was written by Omkar Hiremath. Omkar uses his BE in computer science to share theoretical and demo-based learning on various areas of technology, like ethical hacking, Python, blockchain, and Hadoop.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like