When i need to escape html string
This is why you should use parameterised queries to pass data to your database — then the DB itself knows what to interpret as strongly typed data, and what to interpret as the query construct itself. However if you are forced to regarding your business model you still have an option: Whatever the programming language you are using, you can always rely on the concept of parameterized queries to fulfill that goal.
The purpose of escaping html entities
Any decent static analysis scanner would not flag up a vulnerability if you were storing raw HTML in your database — after all, it’s only a string.
String sequences only become dangerous when passed through a «sink function».
For example, is completely safe to store in your database. In fact so is Robert’); DROP TABLE Students;— .
The latter is only dangerous when it is passed from a string variable and concatenated to a hard coded query in the application because the whole string is passed to the database server and it does not know the difference between the query and the data. This is why you should use parameterised queries to pass data to your database — then the DB itself knows what to interpret as strongly typed data, and what to interpret as the query construct itself.
When outputting to the browser, this is where you need to encode as appropriate. Usually you will be outputting to HTML, so you need to HTML encode ( & becomes & as you say). If you’re outputting to JSON or JavaScript (don’t though it’s a minefield) then you should output the text with hex entity encoding instead ( \x38 ).
If you are going to persist those characters, ensure that you used bind variables (or prepared statements) to persist them.
You should also then add HTML Encoding when displaying the information on your User Interface.
If you can avoid storing such characters for reasons you mentioned (and others), that is the best for you. However if you are forced to regarding your business model you still have an option: Whatever the programming language you are using, you can always rely on the concept of parameterized queries to fulfill that goal. That way you prevent SQL injections.
The purpose of escaping html entities, However, now I have an issue: I really need to store a string input which would contain these characters. It is a business requirement. My code
Is it a bad idea to escape HTML before inserting into a database instead of upon output?
you will also restrict yourself when performing the escaping before inserting into your db. let’s say you decide to not use HTML as output, but JSON, plaintext, etc.
if you have stored escaped html in your db, you would first have to ‘unescape’ the value stored in the db, just to re-escape it again into a different format.
also see this perfect owasp article on xss prevention
Yes, because at some stage you’ll want access to the original input entered. This is because.
- You never know how you want to display it — in JSON, in HTML, as an SMS?
- You may need to show it back to the user as is .
I do see your point about never wanting HTML entered. What are you using to strip HTML tags? If it a regex, then look out for confused users who might type something like this.
They’ll only get the 3 if it is a regex.
I usually store both versions of the text. The escaped/formatted text is used when a normal page request is made to avoid the overhead of escaping/formatting every time. The original/raw text is used when a user needs to edit an existing entry, and the escaping/formatting only occurs when the text is created or changed. This strategy works great unless you have tight storage space constraints, since you will be duplicating data.
Do I need to «clean up» after HTML Escape a Javascript string?, A1: No, that would be if the function also called document.body.appendChild(p);. A2: No, as you can probably guess from A1.
Do I need to «clean up» after HTML Escape a Javascript string?
A1: No, that would be if the function also called document.body.appendChild(p);
A2: No, as you can probably guess from A1. After the function returns, its local variables are discarded, the p becomes unreachable, and will eventually be garbage-collected.
Escape() — JavaScript, Escaped characters in String literals can be expanded by replacing the \x with % , then using the decodeURIComponent() function. Syntax. escape(str)
Java escape HTML
StringUtils.replaceEach(str, new String[]">, new String[])
If it’s for Android, use TextUtils.htmlEncode(String) instead.
This looks very good to me:
By asking XML, you will get XHTML, which is good HTML.
when to escape user input
I wonder when is the best way to handle escaping user input. Two options come to my mind 1) User sends data to server we escape it and then store it into database 2) we store data as it is and escape it when we send data to user. To me it seems a lot easier escaping and then saving data to database but lets suppose someone finds flow in our website and manages to avoid escaping we have a problem of finding all data that we stored to database un-escaped on the other hand if we just store data as it is but escape it once we send it to user even if someone finds flow in our website all we have to do is fix bug as our system already assumes that data saved in database in not escaped. Although second approach seems easier it seems a lot more prone to error. Suppose we generate HTML on server and send it to user and then decide to switch to just sending content to user via ajax, it is easy to forget that we need to escape all the data before sending it to user or implementing new API, or something third. So I wonder what is preferable way of handling this?
2 Answers 2
User input is a string. Escaping is done when you want to insert some characters into some HTML / SQL / Whatever code which insists on interpreting some characters into special functionalities. For instance, you have a ‘
In general, you want to keep strings as strings, and delegate any encoding or escaping to specialized functions which do that well. For instance, for SQL, you use prepared statements. With HTML from a PHP context, you would use htmlspecialchars() .
The point to notice here is that the kind of conversion, encoding or escaping that you need to perform depends on what you are trying to do with the string. If you need the string to put it in some HTML then you’ll use HTML entities (the < for ‘already escaped string, then you are betting that you will use the string only by including it in some HTML.
So you should strive to apply encoding/escaping only upon usage. It is more flexible and makes semantics simpler. Within your database, store the string as a string.
Additionally, for exceptionally high-performance environments where you are certain the data will never be used elsewhere, you might store it in an escaped form. Otherwise, store it in its raw form as suggested in this answer.
+1 deal with escaping at the point you’re injecting — anything else leaves you with broken separation-of-concerns which is very hard to maintain consistently. Prefer methods that automatically get it right because the manual way can easily be forgotten: parameterised queries are preferable to calling mysql_real_escape_string every time and similarly using a templating language that defaults to HTML-escaping everything is in principle preferable to calling htmlspecialchars() every time.
I’ve been looking for the basic definition of ‘escaping’ and haven’t found one. So simply put escaping is a technique to ensure a string is interpreted as just a string and not run as code?
«Escaping» is related to interpretation of data. In most text-based languages (e.g. SQL, HTML. ), there are literal values which are strings of characters; most of the characters just stand for themselves, but a few trigger special processing. For instance, in a string delimited by double-quotes, the » character triggers the end of string. If you want to include a double-quote in the string contents, then that special behaviour must be deactivated. «Escaping» is replacing a character x into a sequence (starting with another special char) that results in a non-special x (e.g. \» ).
EDIT: Luc has pointed out in the concept that I’m unduly slanted towards high-performance solutions. If, in your situation, performance isn’t a concern, then it’s perfectly acceptable (and preferable, in fact) to store the original data alone and transform it on output. This gives you flexibility to use the data however you need you without maintaining versions.
To some degree, it depends. First, the answer is rarely to store the raw data and escape it when you read it back out.
The two common solutions are:
1) Escape the data before storing it.
2) Store two copies of the data, one escaped, and one raw.
In virtually any system the ratio of reads to writes is going to be heavily, heavily canted towards reads. It may be 10:1, but it could be 10,000:1. This is why you want to store the data in an escaped format and only parse it when you’re writing it, not every single time you want to read it.
The benefit of storing both formats is that the original author can modify the content as intended, you can re-process it if you like, you can review the original data. It gives you some additional flexibility at the expense of a little additional complexity.
This is obviously a bit simplistic, as for instance I’m not considering the effects of caching on the read/write ratio, but it hopefully it conveys the general concept.