Showing posts with label good habits. Show all posts
Showing posts with label good habits. Show all posts

20070804

Finally, somebody sees it my way

Link

Okay, Pravir had the idea all on his own. But this is the soapbox I almost constantly stand on when it comes to security.

If you don't want to read the article, the gist is that if you're depending on input validation to fix your semantic flaws, you're missing a great deal of the application where the data bounces around and could potentially get re-broke. And that you potentially end up denying characters that are really legitimate in some context, just not one.

Now, I know Pravir is running the CTF competition at DEFCON, which makes me wonder why he had this post yesterday. Either he has a bunch of them queued and set to go, or he's listened to so many speakers say that something is a problem and the way to fix it is to do better input validation.

20070802

Adding a Request Token for RemoteField

Link

I've been writing a 15 minute expense tracker in Grails, and it was done in 8 minutes, but I've been spending little snippets of time improving it since that initial 8 minutes. After some of the discussions at BlackHat, I decided I'd try to start making some posts on how to do some things more safely in Grails.
The RemoteField Tag in Grails is uber-easy to use, you give it something like the following:

<g:remoteField name="description"
value="${expense?.description}"
action="updateDescription"
id="${expense?.id}"
update="resultdiv">


The problem is that with all the default documentation, there are two problems with this - key exposure, so you have to check this user's permission to edit the specified object, and XSRF. This isn't a problem with the tag, it's a problem with the easy way of doing things in Grails - and this is identical to Rails. An ordinary CRUD call in [Gr|R]ails makes a URL like http://host/<app>/<controller>/<action>/<id>, and the ID is available as params.id. And that ID is (under normal scaffolding), the primary key of the domain object in question. So an attacker just needs to put on their site an iframe that makes the user do something like http://victim/app/epxense/delete/382, and if there's no permission checking, it goes bye-bye.

Adding a level of indirection to keep the PK in the session and then meaningless ID's on the URL won't solve this because an attacker can just make a user delete all their own entries, etc. So you have to use a token that proves the user visited the page first, and making the user solve a new CAPTCHA for every keyup is silly, right?
So generally, what you see is a hidden form element with a long random token that is also stored on the session. When the form goes back, the two tokens are compared to (somewhat) guarantee that the user was on the correct "setup" page before coming to the submission form.

I started down a handful of paths, and the following were less than elegant:

  • Altering the remoteField itself was less than ideal because you would actually have to add several attributes to the tag to determine how you wanted to deal with the token - do you want to use the same token for all elements on the page? Do you want to use a different token for each element? What do you want to name it? Those decisions have to be carried across to the action that gets called by the remoteField as well.
  • Actually replacing the id attribute with the session token actually worked, but only for one field on one form. If you wanted to do this in several places, because you've abstracted the id out, you'd have to make changes to a lot of existing scaffolding to get it right.
  • One thing I didn't try was keeping the id's from a list() closure in session, keyed by new tokens generated for each one. This would take quite a bit of work, but would also remove some key exposure at the same time - as long as you don't use the tokens in the session to order by. This would also allow you to add token checking as part of a beforeInterceptor because then the keys are the token ID's.
So I ended up with two methods that ultimately worked pretty well.

The first method is to use the paramName attribute of the remoteField tag. If you don't specify the paramName, the value that gets passed back will be with the request attribute "value". So the post parameter looks like "value=The+new+value". When I first put the tag together, I had no compelling reason to use anything other than "value" as the name, so I used this to hold the token:

<remoteField name="description"
value="${expense?.description}"
id="${expense?.id}"
update="statusdiv"
paramName=${token}">


Then in the controller:

if (! params[session.token]) {
render('Bad token')
} else {
...


The second method I tried was to append the token as another request parameter in the id attribute, like so:
id="${expense?.id + '?token=' + token}"

This also works, although it muttles up the tag a bit, but it makes more sense in the controller - just compare session.token to params.token.

Unfortunately, it was still up to me to generate the token, remember to put it in the form fields, and to remember to check it on the way back in. Not perfect, but security always comes at somebody's cost. In this case, it was mine.

20070428

False Positives vs. Non-Exploitables


Another professional and I sat down to coffee a few weeks ago, and we got around to a conversation we seem to have often. I feel almost ashamed to publish a post on it when there's no "other side of the argument" here to defend itself. So please bear with me as I try to do both arguments justice (and tell you why my side is right).

Consumers of security tools have long asked that the tool reduce the number of false positives. And this is certainly understandable - it's very difficult to know what to fix when the list of things that are broken includes so many things that aren't broken. All areas of security try to sell you on their perfect false positive to false negative ratio. Theoretically, the two are on separate paths that at some point intersect - as the sensitivity of the tool is increased, false positives go up, false negatives go down. As the sensitivity is relaxes, false positives go down, false negatives go up. The theoretical ideal of the two is where the two are equal - this would be the lowest sum total of the two.

Tool vendors have to make their customers happy. And false positive to false negative ratio is one metric that all vendors use to prove the value of their tool. None of the tools out there use the same taxonomy, result format, reporting, etc., so the only apples-to-apples comparison we have is false positive to false negative ratio.

Now, the tool vendors work very hard to get the false positives to decrease, and depending on the type of tool, they do it varying ways. Application firewalls may use anomaly detection rather than flagging everything with known-bad data. Static analysis tools build a call tree to see if the data is sanitized anywhere in the process, and if so, it's not reported. Dynamic analysis tools don't report "evidence of", they report real exploits.

This is both good and bad. The good side is that because false positives are low, when something is reported, you can be pretty sure it needs to be fixed. When a static analyzer finds an injection flaw, you know that the data goes soup to nuts without being properly checked or encoded. When a pen testing tool finds something you know it didn't find those pesky points where you got back an angle bracket, but just couldn't get a real injection attack.

However, I think that false positives have been confused with non-exploitables. When pen-testing, those pesky points where you get back a real angle bracket, but can't get a real injection to work, although there's no injection, you have sufficient proof that encoding is not taking place. And in static analysis, if you see a place where data goes to a "presentation layert" (database call, screen, wire, LDAP server, command shell, etc.) without being properly encoded, you have a real flaw, even though it's not exploitable.

There are two primary reasons that not reporting non-exploitables is dangerous:

  1. Although the point is not exploitable today, there's a false safety net keeping it from being exploitable - a string length restriction that could change tomorrow, or an input policy that happens to be checked today, but when the same input is accepted from another source tomorrow, the same checks don't take place.
  2. When asked to fix problems, developers must take the results of the test to make their decisions on what to fix. If output is not encoded in two places, and only one is reported, how will the developer learn what to fix by output encoding? They'll learn to fix the ones the tool reports. So how so developers learn to code correctly? They don't - they learn to code the way they always do, then count on the tool to correct them later. Even if it's not exploitable, non output-filtered code should be flagged (possibly rated differently in severity).
So for the security practitioners, do your developers a favor and document those "uncomfortable places" where you couldn't get a really good exploit to work, but you know there's something not behaving properly. When you're doing a manual code review, flag all of those places where data goes to a presentation layer without actual encoding.

And developers, do yourselves a huge favor and when a report comes back with flaws, demand that whoever made the report (your tool, your security team, your peer reviewer), explain to you why it's a problem. Don't simply fix that issue. Discover how you fixed it, and how you can apply the same fix to other locations that weren't reported to you. And more importantly, make that fix a part of your programming habits, so that you just write it that way in the future.

20070319

A Really Best Good Practice

It's been a few years since I've written a lot of code. Certainly I've written individual scripts to perform particular hacks, and I've cobbled together little systems here and there over the past few years, but not writing code day in and day out, I had forgotten the value of something very important.
Last week, I was re-introduced to manual code review - and on paper at that. I used to have a co-worker who always insisted on printing out code, complete with line numbers. He'd have a highlighter and a pencil handy. I never really understood what the big deal was. But last week, being re-introduced to it, I saw value in it like I had never seen.
There's a great deal of value in getting your code away from the computer to analyze it. When you review it on the computer, you have the ability to fix it right away, rather than see pieces in context to determine better architectural changes that you might not have otherwise observed. Also, getting the code on paper gives you the opportunity to really focus on a section of code, without being distracted by other sections, or (heaven forbid) email, IM, the newsreader, etc.
And there's even greater value in getting your code in front of your teammates. Even if you're the sharpest person on your team (I am not), you can learn from the less sharp on your team, and you're doing a benefit to those on your team who aren't familiar with particular algorithms or techniques.
So if you don't do it, begin doing team code reviews. On paper. With a highlighter and a pencil. And if you can't do it on the entire codebase, do it particularly on the most critical sections of code (which might not be the most interesting parts). You might actually find some things a static analysis or a pen test might not find. And you just might learn some things that help you write better code the first time.

20070307

Good Habits Part II-c: Output Filtering SQL LIKE clauses

As I had said at the beginning of the "good habits" series, I wanted to cover more than the usual suspects. So here's one that seems to frequently get missed....

When creating Prepared Statements or Parameterized Queries, everybody knows how to use placeholders for literals where the entire literal is being replaced. For example, rather than:

stmt.prepare("SELECT lastname, firstname FROM person WHERE lastname = '" + strLastName + "'");

It's better to do something like:

stmt = conn.prepareStatement("SELECT lastname, firstname FROM person WHERE lastname = ?");

But what about a like clause?

WHERE lastname LIKE '" + strLastName + "%'"

Well, you can actually include the literal as a parameter and concatenate the trailing wildcard:

WHERE lastname LIKE ? + '%'

Now I just wish I knew how to deal with underbars - a single character wildcard - as part of the literal, while still giving the application the safety of the prepared statement.

20070306

Good Habits Part II-b: Output Filtering LDAP

I did some research before this one, and it's sad that there seems to be so little support in built-in libraries for dealing with output filtering (er - rather "parameterized LDAP queries").

LDAP searches are every bit as susceptible to injection flaws as any other presentation engine. Fortunately, Java provides a parameterized search method. Unfortunately, I can't seem to find an equivalent in other API's.

In Java, rather than simply building the query by concatenating user input, a la:
qry = "(cn=" + strName + ")";
Where strName is injectable (user can include * to return all names, a bad thing, or use parenthesis to close the query, or include other attributes in the search), a better approach is to do something like the following:
qry = "(cn={0})";
result = ctx.search(root, qry, new String[] {strName}, ctls);
With that, strName is properly escaped before being injected into the query.

I looked for how to do this in Perl's Net::LDAP, the OpenLDAP C libraries, PHP, and Ruby, and they all seem to require the developer to do the encoding. The search filter RFC only seems to indicate that *, ), (, \, and need to be encoded, using a backslash followed by the hex of the ASCII value of the character, which would seem to be enough. However, you need to be very careful of what character encoding you're expecting your input in, and any escapes that might be taking place there.

So in Java, do your self a favor and use the search() that takes an Object[] as one of the parameters. In other languages, be very careful on output filtering.

20070226

More on Output Filtering Command Calls

Link

The C example I gave was incomplete, by the way - that includes no way to do IPC. It simply calls the downstream process but doesn't read the results or anything. If you add pipe(2) and the necessary read/write loops on each of the processes, it can get quite un-pretty very quickly with all the necessary error checking. Certainly not impossible - that's why the calls are there, but unless you're really comfortable writing that sort of stuff, there's some more motivation to try to avoid command calls. You've been warned.

20070224

Good Habits Part II-a: Safe Command Execution

If you've been even moderately exposed to application security, particularly web application security, you should see the title and immediately think I've lost my mind. With output filtering, since there are so very many examples of how to do the more common things (HTML encoding, SQL Prepared statements, etc.), I thought with the output filtering examples, I'd try to do a couple of twists on each of those. So the examples I give will not be typical.

That being said, the absolute first recommendation with command execution is don't do it! At all. If you can possibly avoid it. Do anything you can to not have to do command execution.

Now that that is cleared, there is a safe (well, safer) way to deal with command execution than trying to deal with shell metacharacters yourself. Almost all environments and frameworks have a form of a multi-argument command execution function. In *nix environments, they're all based on the execve(2) command and its descendants (execl(3), execlp(3), execle(3), execv(3), execvp(3), and execvP(3)). Here's how you deal with this in some of the more common environments:

C/C++
execve(2) and descendants replace the current command with the one specified. So if you only want the command executed and need to do something afterwards, you must fork(2) first. Dealing with input and output is left as an exercise for the reader. In P-code, it looks something like this:

pid = fork();
if (pid) {
/* This is the parent process.
wait for junior to finish up. */
wpid = wait(pid, &status, flags);
} else {
/* I'm the child process.
call my nasty command-line */
execv(cmd, args);
}
Now you probably don't want to spend a lot of time parsing environment and such in a C CGI, but the example is there because it applies to other environments, such as...

Perl
P-code looks exactly the same. But I need to mention that the usual way of doing execution in Perl is to use backticks. DO NOT DO THIS! You can't possibly do enough input validation to know that every metacharacter will be taken care of. And even if you get it right, you're going to make mistakes in the future. Also, don't do system() which would seem to be a good way to keep from replacing this process - it behaves the same as the backticks, so metacharacters aren't properly escaped. Do it just like you do in C/C++ - fork, exec, wait.

Ruby
The scripting languages closely tied to the system are nice because I don't have to spell out new examples. With Ruby, use the C/C++ means. It's harder, but that's what you get for thinking you need command execution. Fortunately, because of closures, it's a little prettier in Ruby:
fork {
exec(cmd, args)
}
Process.wait


PHP
Not that I'm lazy (okay, I am) but pcntl functions actually have to be enabled at compile time. I don't use PHP much, so the PHP I have here wasn't compiled with it enabled, and I won't be compiling another anytime soon to enable it. The functions work just like their C parents, so the model will look like the others:
$pid = pcntl_fork();
if ($pid) {
# In the parent
} else {
pcntl_exec($cmd, $args);
}
With all of the C derivatives, there's a lot more error checking you could and should do. It would behoove you to find out about zombie processes, reaping dead children, and the like. And again - if you insist on doing command execution, you asked for it.

Java
Java's version of "safely" starting a command is a little more elegant (IMO) than the C derivatives. In Java, it's much easier to get STDIN, STDOUT, and STDERR handles for the newly-created process. If you have a pre-1.5 Java, you can use Runtime.exec(String[]), and if you have 1.5 or newer, you can use the new ProcessBuilder, whose two contructors are a List<String> version and a String... varargs version. Runtime.exec(String) does not escape shell metacharacters, so stay away from it.

When you call Runtime.exec() or ProcessBuilder.start(), you'll be returned a Process object that you can use to wait for the process to complete, and it also has input and output handles to get results from the process, as well as the return code.

.NET
Sadly, from the reading I've done on fixing this in .NET, there's not a really good solution. There's a nice OO interface as in Java - System.Diagnostics.Process, but Process has an Arguments property that is a single string with all the arguments, so it'll be processed by the shell with all its metacharacters. So you're going to end up being stuck with the Windows API for starting commands to work around those, which is even uglier than the fork, exec, wait set on *nix'es. If anybody has a more elegant solution that includes metacharacter filtering on .NET, please let me know.

Conclusion
Now, you still have lots of other things to consider - has the command you want to call been replaced by some other? How does environment play into all of this? And the most important thing to consider is whose effective ID will be running this process. Just like any other form of injection, you still have to consider the rights of the process that starts this off.

And let me reiterate - don't do this at all unless it's absolutely, positively necessary and there is no other means to accomplish what it is you want to do.

20070219

Good Habits Part I: Input Validation

Link

In a previous post, I made mention that many security flaws are really the result of a lack of good programming practices. Since I'm a solutions-oriented type, here's Part 1 of <<not sure how many>> on some of those good habits.

My colleagues will probably have a heart attack upon reading that Input Validation is my first post on good habits, as I've always been very passionate about output filtering. But I've always felt that proper business-rule input validation is critical.

Remember back to your days as a wee tot programming in turtle logo on the Apple II-e. Or maybe just remember back to programming in BASIC on the C=64 or TRS-80. (I realize I'm dating myself here). When you were first learning the basics of programming, one of the most important things you had to do as soon as you took input from a user, from a file, or from thin air, was to verify that the type of data received was consistent with the type of data you intended to operate on. This was absolutely critical to having the application behave as expected.

C and assembler programmers know this very well. If you don't deal with the right type of input in your application, you'll get very odd results. Things like reading random areas of memory, buffer overruns, fencepost errors, and on the rare occasion, you'll give your application the ability to inject new handlers into the Interrupt Vector Table. But somewhere around VB4, and those other 4-GL's, when components were coupled to the code that deals with them, and the design interface allowed but didn't require you to specify the format of incoming data, applications went very quickly from prototype to production, and those simple requirements were left by the wayside. (I think I'm guilty of telling people "Don't enter apostrophes, they mess things up" as well.)

Things didn't improve when we made web applications. And at some point security professionals thought they'd start recommending input validation again. But since the application security field seems to have evolved out of network security, rather than application development, security professionals always made recommendations regarding "malicious characters" (a term that makes me sick to my stomach). So those developers who have effectively been un-trained (or never trained at all) of good input validation habits are now being drawn into what is almost as bad - blacklist input validation. I say almost as bad because it blacklist input validation provides very, very little protection (just ask MySpace), but gives developers a false sense of security.

So what are the good habits? I think the best approach to input validation is to follow this (pretty well agreed-upon) formula:

  1. Whitelist input validation. If the specific item requested or sent by the user isn't in the list, reject the value. And this can happen in more places than you might think. If you have a jump page, you know the pages you expect to send your users to off your site, so enumerate those, and use a level of indirection so that the user can only send, say integers, which index into an array with the list of sites that you can jump to.
  2. Positive regex matching. This is where you specify a regular expression that the field must match. Phone number matching might look (in the US) something like: /\(?\d{3}[)-]?\d{3}-?\d{4}/, and if the data doesn't match that, then reject. Surnames might match /^([a-zA-Z]+[ '-])?[A-Z][A-Za-z]+$/. If I get the time, I'll reconstruct my regex for matching an IP address without having to do arithmetic on each octet.
  3. Negative regex matching. This is "don't allow malicious characters". I hate this philosophy. There are a billion ways that "malicious characters" (did the bytes themselves intend harm?) can be encoded on input. If you're stuck with this, you had better be doing presentation filtering (another good habit) further down the pipe - you are bound to miss something.


So there you have it. Deal with what you expect, first. And please use libraries where possible like the Apache Commons Validator, which will give you a really consistent interface for validating things. And really, really look for chances to apply a level of indirection. Like I say, there are probably more places that you can do that than you might think. (Primary key fields, jump URL's, any list, etc.)