You are not logged in.
- Topics: Active | Unanswered
Pages: 1
#1 2010-02-10 03:40:38
- MattF
- Member

- From: South Yorkshire, England
- Registered: 2008-05-06
- Posts: 1,230
- Website
Input/output sanitisation question
Probably having another of my blonde moments here, but here I go. ![]()
Is there any reason to be so anal on using htmlspecialchars on all output if all input is stripped of, for example, html tags if html is disabled, (except within code blocks, obviously), and debatable characters are encoded before insertion into the DB? htmlspecialchars converts seven symbols max? Add to that list parantheses to counter JS, then surely if they are all accounted for on, (and cleaned), at the input stage, output sanitisation could be reduced somewhat? Anything then which does contain html tags on output could be construed as intentionally being construed as html? Am I missing something blindingly obvious again?
Cheers. ![]()
Screw the chavs and God save the Queen!
Offline
#2 2010-02-10 04:18:38
- Smartys
- Former Developer
- Registered: 2008-04-27
- Posts: 3,117
- Website
Re: Input/output sanitisation question
Correct, but filtering on input tends to be a bad idea: you usually want to preserve exactly what the user inputs (since otherwise it's easy for them to get confused when editing, etc).
Last edited by Smartys (2010-02-10 04:18:45)
Offline
#3 2010-02-10 04:46:43
- ridgerunner
- Developer

- Registered: 2008-06-24
- Posts: 179
- Website
Re: Input/output sanitisation question
You are right about [pun_]htmlspecialchars() being called a lot (313 times from 37 files as of rev 1359).
However, a quick look at the code reveals that these calls are being made for strings that are typically user (or admin) editable (i.e. board title, forum name, category name, post content, etc.) When presenting these strings back to the user in web form edit text boxes, you don't want to encode these special chars, so the strings need to be stored in their non-sanitized, editable format. The special encoding needs to be done at the last minute when assembling the html markup. As far as I can tell after a cursory glance at the code, this is exactly what is being done. But yes, 313 times does smell a little funny!
To pass w3c validation (an anal, but noble goal), it is essential that all the [<>&] chars be encoded.
Offline
#4 2010-02-10 04:58:53
- MattF
- Member

- From: South Yorkshire, England
- Registered: 2008-05-06
- Posts: 1,230
- Website
Re: Input/output sanitisation question
You are right about [pun_]htmlspecialchars() being called a lot (313 times from 37 files as of rev 1359).
Must admit, I wasn't referring to Flux specifically, but rather more to how paranoid we just seem to be in general with regards to the sanitisation.
To pass w3c validation (an anal, but noble goal), it is essential that all the [<>&] chars be encoded.
I'm totally anal in that regard.
The day when I reach output parsing perfection, (and I can keep pure XML mode running without some users input screwing things up), I'll be happy. ![]()
Screw the chavs and God save the Queen!
Offline
#5 2010-02-10 05:06:17
- Smartys
- Former Developer
- Registered: 2008-04-27
- Posts: 3,117
- Website
Re: Input/output sanitisation question
You are right about [pun_]htmlspecialchars() being called a lot (313 times from 37 files as of rev 1359).
Not really. You need to sanitize output, as you said. Short of using a template system that automatically sanitizes output variables (which would just obfuscate the number of calls, not reduce them), there really isn't a good way to decrease the number of calls. There might be some consolidation possible in certain places, but I think that any performance hit would be insignificant.
Must admit, I wasn't referring to Flux specifically, but rather more to how paranoid we just seem to be in general with regards to the sanitisation.
Yes, because if you don't sanitize output properly you have a whole slew of problems: not validating is the least of them.
Offline
#6 2010-02-10 05:22:17
- MattF
- Member

- From: South Yorkshire, England
- Registered: 2008-05-06
- Posts: 1,230
- Website
Re: Input/output sanitisation question
Correct, but filtering on input tends to be a bad idea: you usually want to preserve exactly what the user inputs (since otherwise it's easy for them to get confused when editing, etc).
Good point. Would it not be practically possible though to actually work the opposite way on the parsing, and revert the changes when editing or suchlike takes place, (if necessary), so that they do actually see exactly what they entered? For input that was obviously an exploit attempt or such, or plain old disallowed, that could always be dumped completely.
I just started thinking about this one as I'm playing around with a system at the moment and got to wondering about alternative options. It seems we spend more time trying not to forget to neutralise the output before it gets displayed than any other task,
and sanitising on input seems to have more possibility for doing the task in one or two places alone rather than throughout the code, (removing the probability of the occasional oversight omission allowing an exploit etc). Plus, output side parsing always seems to be throwing spanners in the works when you start trying to centralise it.
Last edited by MattF (2010-02-10 05:26:03)
Screw the chavs and God save the Queen!
Offline
#7 2010-02-10 05:25:09
- MattF
- Member

- From: South Yorkshire, England
- Registered: 2008-05-06
- Posts: 1,230
- Website
Re: Input/output sanitisation question
Yes, because if you don't sanitize output properly you have a whole slew of problems: not validating is the least of them.
Ain't that the truth. ![]()
Screw the chavs and God save the Queen!
Offline
#8 2010-02-10 05:29:29
- Smartys
- Former Developer
- Registered: 2008-04-27
- Posts: 3,117
- Website
Re: Input/output sanitisation question
Good point. Would it not be practically possible though to actually work the opposite way on the parsing, and revert the changes when editing or suchlike takes place, (if necessary), so that they do actually see exactly what they entered?
Physically possible? Yes, in most cases. There are some instances I can think of where systems similar to that are a very good idea. However, there are some things to keep in mind:
- The action must be fully reversible. If you remove characters with no way to put them back in, you're stuck.
- You must make sure your sanitization doesn't cause any other problems. For instance, calling htmlspecialchars and then storing data in the database does tend to cause issues with string length: you've just increased the length of your string beyond what it was before. You have to keep that fact in mind (and come up with a good way to pass that information to the user)
- The main gain here is performance. You sanitize your data before it's stored: you don't need to resanitize every time it's displayed. The tradeoff is having to undo the sanitization when it's being edited, etc.
Offline
#9 2010-02-10 05:56:22
- MattF
- Member

- From: South Yorkshire, England
- Registered: 2008-05-06
- Posts: 1,230
- Website
Re: Input/output sanitisation question
- The action must be fully reversible. If you remove characters with no way to put them back in, you're stuck.
Better to break the input in a reversible manner than drop it then.
- You must make sure your sanitization doesn't cause any other problems. For instance, calling htmlspecialchars and then storing data in the database does tend to cause issues with string length: you've just increased the length of your string beyond what it was before. You have to keep that fact in mind (and come up with a good way to pass that information to the user)
So make sure that any input character limits and suchlike are derated or the respective DB length limit, (if there is one), is increased to compensate.
- The main gain here is performance. You sanitize your data before it's stored: you don't need to resanitize every time it's displayed. The tradeoff is having to undo the sanitization when it's being edited, etc.
That I can live with. Performance is a nice side effect though. It is just purely and simply limiting the probability of a slipup being detrimental that's my main goal. I've spent the last several days mulling this point over and hadn't found a single real arse biter of a problem with doing it this way. Hence, that's why I was becoming paranoid and certain I'd completely overlooked something obvious.
I'm not used to things being this simple and painless in theory. ![]()
Cheers. ![]()
Last edited by MattF (2010-02-10 05:58:51)
Screw the chavs and God save the Queen!
Offline
#10 2010-02-10 10:10:28
- Franz
- Lead developer

- From: Germany
- Registered: 2008-05-13
- Posts: 3,755
- Website
Re: Input/output sanitisation question
The main problem I see here is that this could turn even actually harmless SQL injections (e.g. because of MySQL's limit to one query) into dangerous ones because it could allow people to insert for example harmful JavaScript code that would get displayed without escaping...
Offline
#11 2010-02-10 11:31:49
- Reines
- Lead developer

- From: Scotland
- Registered: 2008-05-11
- Posts: 3,140
- Website
Re: Input/output sanitisation question
Along the lines of output sanitization, Facebook just released another PHP extension they have been working on, XHP, which basically extends PHP so that it understands XML/HTML. This means it can then automatically sanitize strings output because it knows what context they are in. It sounds quite interesting, but it seems to have quite a performance hit as you'd probably expect, and obviously is going to require the server to have the extension installed.
The original post: http://www.facebook.com/notes/facebook- … 4003943919
Some tesing: http://toys.lerdorf.com/archives/54-A-q … t-XHP.html
Offline
#12 2010-02-10 13:18:58
- MattF
- Member

- From: South Yorkshire, England
- Registered: 2008-05-06
- Posts: 1,230
- Website
Re: Input/output sanitisation question
The main problem I see here is that this could turn even actually harmless SQL injections (e.g. because of MySQL's limit to one query) into dangerous ones because it could allow people to insert for example harmful JavaScript code that would get displayed without escaping...
Shouldn't be a problem, unless I've overlooked something? Parantheses would be on the list of pre-encoded characters, and pure html would already have been neutralised. That removes the options for the JS code, does it not? (Or is there some other format for active JS code too)?
Screw the chavs and God save the Queen!
Offline
#13 2010-02-10 13:21:05
- MattF
- Member

- From: South Yorkshire, England
- Registered: 2008-05-06
- Posts: 1,230
- Website
Re: Input/output sanitisation question
Along the lines of output sanitization, Facebook just released another PHP extension they have been working on, XHP, which basically extends PHP so that it understands XML/HTML. This means it can then automatically sanitize strings output because it knows what context they are in. It sounds quite interesting, but it seems to have quite a performance hit as you'd probably expect, and obviously is going to require the server to have the extension installed.
The original post: http://www.facebook.com/notes/facebook- … 4003943919
Some tesing: http://toys.lerdorf.com/archives/54-A-q … t-XHP.html
That's another hour of my life getting sidetracked, reading that.
Did anyone have a play with that other thing they were releasing, btw?
Screw the chavs and God save the Queen!
Offline
#14 2010-02-10 13:58:21
- Paul
- Developer
- From: Wales, UK
- Registered: 2008-04-27
- Posts: 1,623
Re: Input/output sanitisation question
Along the lines of output sanitization, Facebook just released another PHP extension they have been working on, XHP, which basically extends PHP so that it understands XML/HTML. This means it can then automatically sanitize strings output because it knows what context they are in. It sounds quite interesting, but it seems to have quite a performance hit as you'd probably expect, and obviously is going to require the server to have the extension installed.
The original post: http://www.facebook.com/notes/facebook- … 4003943919
Some tesing: http://toys.lerdorf.com/archives/54-A-q … t-XHP.html
Can't you do much the same thing if you are using templating. Before including the template itself you sanitize all the template variables in one go possibly using a plugin filter for the template class.
The only thing worse than finding a bug is knowing I created it in the first place.
Offline
#15 2010-02-10 14:14:09
- Reines
- Lead developer

- From: Scotland
- Registered: 2008-05-11
- Posts: 3,140
- Website
Re: Input/output sanitisation question
Yeah, they described it as: "XHP is something between a programmatic UI library and a full templating system."
Offline
#16 2010-02-10 14:44:24
- MattF
- Member

- From: South Yorkshire, England
- Registered: 2008-05-06
- Posts: 1,230
- Website
Re: Input/output sanitisation question
Can't you do much the same thing if you are using templating. Before including the template itself you sanitize all the template variables in one go possibly using a plugin filter for the template class.
Easily done in a template class. That's the method I was going to use initially for sanitising in this project, but it also throws up quite a few subtle problems. How do you tell what's intentional and unintentional output? What should be sanitised and what shouldn't?
That's why I'm going to try this route of input sanitisation and see how capable it might be. The logic seems far easier when on the input rather than the output side. For example, if a user is logged in and an admin, the input side of things knows that when something is created. To access the same logic on the output side though, you end up accessing the DB to find out who created what etc.
Screw the chavs and God save the Queen!
Offline
#17 2010-02-10 15:54:32
- Franz
- Lead developer

- From: Germany
- Registered: 2008-05-13
- Posts: 3,755
- Website
Re: Input/output sanitisation question
lie2815 wrote:The main problem I see here is that this could turn even actually harmless SQL injections (e.g. because of MySQL's limit to one query) into dangerous ones because it could allow people to insert for example harmful JavaScript code that would get displayed without escaping...
Shouldn't be a problem, unless I've overlooked something? Parantheses would be on the list of pre-encoded characters, and pure html would already have been neutralised. That removes the options for the JS code, does it not? (Or is there some other format for active JS code too)?
Nah, what I meant was if you are able to exploit an SQL injection hole, you might be able to insert something directly via SQL (for example "<script>alert('boo!');</script>" in a field of which you know that it is going to be printed out without being escaped. And then you have a problem... But assuming you don't have those holes, you should be fine with your way...
Well, this does assume that you don't sanitize your SQL injection input ![]()
Last edited by Franz (2010-02-10 15:54:54)
Offline
#18 2010-02-10 16:14:26
- MattF
- Member

- From: South Yorkshire, England
- Registered: 2008-05-06
- Posts: 1,230
- Website
Re: Input/output sanitisation question
Ah, right. I'd totally misunderstood what you meant initially.
Aye, there is always that drawback, though I would be far more worried if I cocked up enough to allow that scenario to arise in the first place.
This is the same project that I'm using the parameterised queries on, so hopefully it should be a nigh on impossible task to see that scenario occur.
Screw the chavs and God save the Queen!
Offline
#20 2010-02-10 16:23:27
- MattF
- Member

- From: South Yorkshire, England
- Registered: 2008-05-06
- Posts: 1,230
- Website
Re: Input/output sanitisation question
Must admit, input sanitisation does seem totally unnatural, but each time I run through the list of pro's and con's, it always seems to win hands down on the general ease of implementation. Less processing and verification can be done at the opportune moment, so to speak. It really does seem weird though.
Screw the chavs and God save the Queen!
Offline
#21 2010-02-10 16:40:09
- MattF
- Member

- From: South Yorkshire, England
- Registered: 2008-05-06
- Posts: 1,230
- Website
Re: Input/output sanitisation question
Just on a slight aside, this is the template class where I was sanitising the output in one single location. Just incase anyone may have a use for it or just feel like having a play.
(It probably still has some bugs in it yet, as it is still a bairn).
http://outgoing.bauchan.org/unix/template.txt
The $extpls var is just an array containing file names whose content doesn't get sanitised with htmlspecialchars.
Last edited by MattF (2010-02-10 16:41:47)
Screw the chavs and God save the Queen!
Offline
#22 2010-02-10 16:41:45
- Reines
- Lead developer

- From: Scotland
- Registered: 2008-05-11
- Posts: 3,140
- Website
Re: Input/output sanitisation question
I didn't have a proper look, but FYI you can't return false from a constructor.
Offline
#23 2010-02-10 16:43:10
- MattF
- Member

- From: South Yorkshire, England
- Registered: 2008-05-06
- Posts: 1,230
- Website
Re: Input/output sanitisation question
I didn't have a proper look, but FYI you can't return false from a constructor.
Sorted.
Force of habit moment there. ![]()
Screw the chavs and God save the Queen!
Offline
Pages: 1
