This is a transcript of the A05 - Security misconfiguration & XML entities training session. Click here to return to the OWASP Top 10 Security Vulnerabilities page
Michael Meyers: [00:00:00] I guess we're going to jump into the next topic with you, Fabian, the Security Misconfiguration and XML entities.
Fabian Franz: Yeah. I'm taking the XML entities. And I think it might be a good idea to put that in now.
Michael Meyers: Take it away.
Fabian Franz: Sure. I'm sharing my screen.
Next few are visible. Great. Okay. So some external entities for those who have been on the security training before. Um, it was always a little bit of a frustrating topic because I was like, Hey, this security thing exists. That kind of, it has been mitigated everywhere. And by now it's not a big problem anymore.
And I was telling you some best practices on how to mitigate it, but it was really frustrating because there was really no danger felt from this thing and that felt wrong because it's still in like the top thing. [00:01:00] It has been a huge deal in like 2014 or so, but it can still be a problem today, especially if using for example, the PHP and and today I'm very proud and glad to be able to show you an example of how some simple mistakes that can happen as part of not adhering to some code standards or just misunderstanding something can lead to problem of suddenly appearing again. So I'd start - what we here have is first of all, we start with the harmless XML. Let's understand a little bit of how XML works. WIth XML, you have like this header, then you have the doc type and this doc type, which is like a DTD, it can contain entities and this empty harmless is completely harmless.
And the result is plus result is at harmless. So if you know, HTML, you [00:02:00] noticed like, and not in breaking space and greater than, or whatever, and that is already an entity this XML but with - by specifying the DTD, we can define as many entities as we want. So Yeah, this is essentially a reference to something else that is defined here.
And the advantage obviously is reuse. We could just need to write completely harmless once we don't need to do it so many times. So how can load this let's just go to the next example. We just call simpleXML_load_file harmless.xml, or if you're printing it out. So let's do that. And as expected the result is this result is completely harmless.
So now how could some harmful XML look, that's the first thing you need to understand. And [00:03:00] the first thing that came to mind is to, um yeah, is to just use an entity again, use a system command, and then reference to a file. The huge problem is that, is that if you are referencing to a file the XML parcel was say, Hey, if I put in the string etc, past 50 here, it's not valid X, X amount.
It's not even valid text. So this will fail to do our exploit. And I'm going to go into that in more detail, but it's just, it fails. But fortunately, or unfortunately, PHP has a nice way of making this work. And this was actually almost exploited at Facebook, a security researcher found it before any malicious hackers, it's assumed, and got a nice bounty for it.
Um, but essentially, [00:04:00] um, this harmless entity is not on definitely not as harmless anymore because it's referencing to some different system. And this is using PHPs file order to essentially the streaming where you can have, like in Drupal we have public, we have private, but there's also PHP and PHP has a filter function in, so it can easily read the resource of /etc/passwd and automatically convert it to base 64 encode .
Um, the point, however, is to really make this work, this harmful is to is said we need to actually do something very special. We need to put this very special parameter LIBXML_NOENT. And, and this was what I was saying is, yeah, by now, if you're, if you're just doing this, like [00:05:00] if ever, ever moving this, like we had quick and we are just outputting that and you're just putting the XML. And not the base unit. They code them. Yeah. There's nothing here, but as soon as we allow like this LIBXML_NOENT and then our exploited and as we can see, we call the XML harmful.php again, and here's our /etc/password. Okay. So someone needs to really go out of their way to shoot themselves into the foot.
That's because of the history of the issue, they are also valid ways of why this might be needed. So for example, if you're using a sub client or give using some very special XML from them and external system, which is completely trusted, [00:06:00] it may completely, now you definitely want to know what external entities, so there's a valid use case for that.
So some code could still have that. And this essentially. I'm seeing in our scenario, I've created a little library and this mylib.php is really simple. We have an init function and because those people had been really security conscious. They just aim at the antiloader by default to somehow to has been fixed.
So there's even a nice comment about it. Um, here we are loading the LIBXML_NOENT now, and, and whenever someone was in the old code, like let's assume this from another project needed to get some entity from a trusted system, they would just disable the entity loaded forwards, allow into loading to entity and disable it afterwards.
Again, [00:07:00] always secure, nice and extra code. But now this whole library is given to another developer who's also developing something and they want to use our library. And so they'll be using the library and actually they sweep they need to load the XML. How does XML and putting it out again, we can test this and we see.
Okay. Um, we can't even load that's fine. So live loading fails. So we remove the init because we're a nice developer and Hey, it works. Okay. Um, and that's a nice, so other developer might have just thought, Hey, this init [00:08:00] function does not to match. Let's just remove it. And my code bugs, I have deadlines to keep it as well, like that it's a safe practice, but it can have met practice and.
Or they develop a, could just forget like myself when I was writing the sample code, I was like, Hey this one is just example phase by the way is hanging. I could just, I'm not in it, the library, because nothing forces me to. So again might be a little bit more of insecure design that if you have a library, who's in it, it's not mandatory, like in a class constructor or something people could just forget.
And if there's no notice of, Hey, you haven't INITed this library yet. Uh, it's not safe to use it, things like that can happen. So again we can just yeah, and the thing is, however, we have a handful of dissolved as well. And again, we would need to quickly remove [00:09:00] the INIT code
and then it would fail. Yep. Here. We have INIT the code. If it can make it fail and all this garbage is again the /etc/passwd. Okay. So let's say I'm our developer isn't so junior or negligent of just saying, Hey, I'm just gonna remove the INIT. Uh, Nope. They are like, I need my code to work. Um, let's figure out why this library that should supposedly be so great and that I've been told to use.
It's now broken. So we're go into mylib again and look at the code and now we find, Hey. It's probably this line that makes it works, but here says till [00:10:00] libxml2 has this fixed. But if we, now we're looking at the PHP uh, documentation for the disabled entity loading what do we would actually be seeing is that, that this has been disabled. So PHP has this like little standards here as of libxml 2.0 and two substations to say it, but do you feel so there's no need to disable the loading of external entities, unless there's a need to resolve internal entity references
Okay, so it's disabled and we can say "let's just remove it" . And I have already prepared that. And here, I want to show something else, which was really important in terms of that. Uh, let's take a look at the lock of import-lib-fixed. And as you can see, it's only the import of my lib from project foo. It happened to me more times, [00:11:00] um on various projects in my whole feed future, not necessarily at Tag1, but especially also at clients before.
Um, the people that have presented presented me PRs, where they're sort of completely innocent import of mylib but the code has been changed. What I would like to see it at least this, because now I at least see here's the input of mylib as unchanged. And now it has enabled this loading, but still, if you look at, this this code here here and checking this I might still not spot this as a security problem I might , but others might not it can push through.
And if we, again, look at the whole diff of that thing, and if someone isn't looking at the individual commits, when reviewing this code again, this could just fit. But [00:12:00] again, how many developers heavy let's check out main. Um, everything is fixed. And so now our, our, our examples are working it's harmless, it's harmless and it's problem.
So, and what I did here is I was loading this payload just from a guest where I've uploaded. And obviously there are several ways to, to solve this problem. Uh, the easiest way probably would have been for the developer to just not load the file, but instead just load a string symbol XML or string.
The other is to obviously not have such dangerous functions in the library, like never do this libxml_noent. And, and what my critique is has been installed is, is that this is just named [00:13:00] really bad because if it says NOENT, I assume it disabled something, I would never come to the conclusion that this is enabling something.
So yeah, this is very, very misleading. There's a huge security notice on a PHP website, but if you didn't look at it and just assume this is doing like it's disabling. Internal external entities to be loaded. Then you will think this code is valid, but it's not a way to solve that whole thing.
And to get much control about it. Is it added newly to PHP seven and mandatory in PHP 8? And that is to set an external entity loader. That means we can decide what happens if it gets such a request. So again, here, I'm loading my, my bad example and we replace even the, the [00:14:00] file loading over the bus, our great example that now it's completely harmless, but if I replaced us through the harmful .xml sorry.
Um, Oh, yeah, the entity harmless is not defined. So it would need to do a little bit more to, to get this work now. But for example, if I just made it do this public, which is probably what I want, then it should just maybe work or not. Anyway. Um, you can play around with such a loader and oh, it's not public. It's the other one system, obviously. So we left the most at risk - and there we are. Yeah. There's our payload again. So we've now replaced the entity loaders. through the, through the default entity loader but we can clearly see that [00:15:00] this here's this PHP coming in and it could prevent it from ever reading something that has PHP SFI or file or anything.
That's not trusted. I could validate all your other. Going towards ed that they're coming from trusted systems and now have full control even though for some valid use cases, I need XML entity loading. So a further library a good design would probably to implement a class, have to construct judges in editing and setting this external entity library loader.
Then have a white list of things that are enabled and then ensure that yeah, someone is using it or a much easier way, just have two functions. One function for learning somehow with external entities, which is clearly defined as said in one function, which is harmless, which you can shoot yourself in the foot.
And that would be it for external entities.[00:16:00]
That's what I say, never ever put this XMLNOENT inside a helper function, something like that. An example of how it can also happen in Drupal is several input filters in contrib and even core implemented by loading strings. Um, we are like lib simple xml load file or simple XML load string.
And if some of them were using like a vulnerable XML version and some variables of. Um, they may have explicitly this NOENT because they are completely confused on what it actually means and they want to make it more secure and just mix it up, which is even happened in some contrib code at some point.
Um, then this input filter would mean that an editor just would need to put an XML document inside of what you see is what you get. If they have full HTML and they could take over a site, essentially we are in other remote procedure explored core, for example, like you could [00:17:00] then download settings PHP because the, the higher, the editor is not many permissions would be editing.
The XML would then show the Drupal settings PHP, and then they have some credentials for database and everything else. And.
Unknown: What about if we were using, would assume this vulnerability would also affect any potentially any HTML templating as well. If we're doing like HTML to PDF generation, you think that would also be a vector here?
Fabian Franz: Yeah. As I said, you need to shoot yourselves in the foot by enabling the external entity loading right now. So it's a little bit more unlikely unless you use some like helper function or library is completely broken. Um, but yes, until you have, it can happen if you load the string and it's part of that.
And that node happens to be valid XML and HTML and it just gets yeah. In theory. Um, if someone is hiding an XML [00:18:00] document somewhere and especially, and you're extracting that like, and then putting it back to a string and then transporting it back into a DOM document, it could trigger those export. So just be very, very careful in never using XML NOENT.
And general, be careful of what data you have: validate data. Like, for example, if you validate that it's flooded XHTML, then most XML documents would fail that validation. Same thing. Pretty good. Yep. Thanks.
Janez Urevc: Um, yes. So the topic that Fabian just covered -this part of the security misconfiguration category there are other things that we can do wrong when it comes to configuration. Um, [00:19:00] we obviously have things like this is a multi-layered topic. Um configuration can be about the application configuration.
Um, it can be also about, you know, PHP, language configuration but also relates to infrastructure, the operating system. So we have to think about all those layers and some, you know, recommendations or best practices that I would mention here is first to use the approach to deny by default and then selectively grant access.
Um, think about your features and services and consider if you really need everything that you think. You need [00:20:00] and disabled things that you can live without because every single service that runs in your system or every single port that is open, or if we're in context of Drupal, every single module that we install or page that we create adds another possible attack factor, theoretically.
So if you have less things and more simple system it's less likely that something will happen. It's also easier to reason about the system and to understand it and to know what is going on. So always try to keep things stupid and simple.
In Drupal specific projects, but also more in general things that we can do is first we [00:21:00] can follow and learn about best practices about the system that we're using. Um, Then one thing that we can do is making sure that no folders are exposed to, to the world that are not intended to be exposed.
A example of those would be for example, .git folder, if you expand, exposed, .git folder. You can, you, you potentially expose your entire code base to public, which is definitely a bad idea. Um, another thing that is Drupal related, but could also apply to other frameworks is the configuration files.
Um, if you put your configuration files in the docroot, and I believe that I'm sure that before that was the case. [00:22:00] But people might still be, I'm not sure Drupal's put configuration files into public directory. Um there might be that there was an htaccess that was preventing listing those, but you can easily, you know, accidentally remove htaccess so the best practice being grouped with that, with these to move your configuration files outside of the docroot.
And this is exactly what a default Composer project will do. Um, and we should use that instead of trying to figure out our own way of doing things. Same goes with private files or the vendor directory like everything that doesn't need to be visible to the public should be outside of the docroot.
Um, another thing that can happen is that you have unnecessary modules enabled, maybe you've enabled some modules that help to [00:23:00] do some one-off tasks, but then you forgot to disable it. Maybe that module has some page that exposes some information that you don't intend to or something like that.
So something to be very careful about then we have super power users in Drupal, the example would be the user with a user ID 1. Um, but you know, there are other things that we can think about, for example, root user on Linux operating systems. Um, the best thing to do is just to disable them like we never use the UID 1 account on our projects that we work on.
It's always disabled and then we [00:24:00] use actual accounts for each user that needs to access it. Even if we're talking about administrators who have a defined permission sets instead of having access to everything like that super user has. A root user on Linux uh, the, the example would be to deny access through SSH protect user.
Most modern distros, this is already done, but something to think about then other times there would be issues with file permissions on, on on the server, like having PHP files writeable, settings. PHP writable. All of those things. Um, and I would say the [00:25:00] best way to avoid those kinds of issues cause is to automate deploys and never manually change things. Um, we've all been there where we had a problem and, you know, in an environment where automated deploys are not strictly enforced, somebody went in log onto the server and started messing around and trying to fix things or try to figure out what's going on.
And along the way, changed permissions to some files or something like that, or manually changed the file. Um, and. And forgot to revert it back. Um, and we, if we have automated deploys we can have steps into deployment that makes sure that all these things are set correctly. And if we enforce these automated deploys and [00:26:00] it's, there's no way that somebody can go in and manually change things.
So definitely a best practice. Um, yeah. And also think about the configuration of your web server. Um, think about security headers. Are the security headers set correctly, sent correctly? We're talking about like HSTS header, core's headers, things like that.
Do you have any other comments about this topic, Fabian?
Fabian Franz: No, I'm good. Um, yeah, it's just hard to them. It's um, again, it's, it's important to just follow the best practices. I think that the only thing you really need to say to that. Absolutely security misconfiguration. But again, even my example of XML external entities, a practical example was kind of a [00:27:00] security misconfiguration back here.
Um, one thing that by the way, it could have had much more catastrophically is let's assume this library finds it way upstream again, like this update of removing this, because it's no longer needed and no one is reviewing it. And now it ends up in a completely different production application that would have been very secure before and suddenly this application gets vulnerable due to this library.
Having changed to be something insecure suddenly. Um, and I think that's an important part of how to think about security, misconfiguration that some misconfiguration somewhere, or using a library from me or enabling something that shouldn't be there can open huge holes in completely unrelated systems.
Um, that's probably also what happened like with the log4j thing. Um if you are aware of that vulnerability that we recently had, that whenever something was logging [00:28:00] something, it could lead to an exploit, but that means it's about some servers sitting in, in some in, in the end of the room that no one even knows about anymore.
That is uh, for some of the reason also I'm going through log files and then that's a huge problem because it might be misconfigured because it has the old log4j version, for example. And and then suddenly you get an exploit, even though your whole system is completely firewalled, et cetera like that.
So but yet whatever is processing data needs to be securely configured. I think that's kind of the, we'll have some here.
Janez Urevc: Any comments or questions.
Michael Meyers: Awesome. Thanks guys. Uh, Fabian next up is server-side request forgery.
Fabian Franz: Sure. [00:29:00] Um, one moment. I'm going to oh goodness. Can you, can you open that on the screen already?
Yeah. Um, I always like to think of packages as liabilities. So every new dependency is also adding a liability and then it's much easier to justify things like that. For example.
Yes. There's certain ways when a custom code, like if you find like five lines of custom code P uh, better than to just add another Drupal module or library or whatever that is indeed better, but if you have very complex needs or are insecurity critical. Then it's often better to just use established libraries.
Um, I think one important thing is however, to understand the distinction between concepts and code of external of libraries you're using [00:30:00] and also of ways you are using them. So sometimes modules or libraries, or just edit because off, and that's not always a good idea. We see the extreme offset in the MPM and node ecosystems where like, there's sometimes like one-liner shared library so that everyone is importing them, et cetera, it leads to these huge dependency trees and where an ecosystem, many libraries has its monitors.
Um, one problem, you know, of not having a framework, like, like that's more consistent in every thing like Symfony or Drupal whatever. Um, it said you need to trust a lot more people like with Drupal, you need to trust Drupal in most of the symphony and Drupal even takes a little bit care with Symfony and other dependencies, for example, archive, tar that they are remaining secure.
Um, but if you are just using [00:31:00] whatever library someone has, it can be of limited quality or the mistake heavy. For example there's a problem that maintainers go away or don't want to maintain all the versions anymore. Well, I gave tar by the way it's a good example. It's used by many large frameworks.
It's on peer. Like it's a PHP core thing, but they had a security incident and first handled it a little bit poorly, I would say. Um, so it always depends on, on how good the backing is behind those things. And that's one of the advantages of larger frameworks, larger islands that are behind something, larger projects that are well backed and have strong developers behind them.
Um, because they often yeah, a little bit more professional then or you need to do a little bit less due diligence than if you have those other [00:32:00] things.