Java string replace and the NUL (NULL, ASCII 0) character?
Testing out someone elses code, I noticed a few JSP pages printing funky non-ASCII characters. Taking a dip into the source I found this tidbit:
// remove any periods from first name e.g. Mr. John --> Mr John firstName = firstName.trim().replace('.','\0');
Does replacing a character in a String with a null character even work in Java? I know that ‘\0’ will terminate a C-string. Would this be the culprit to the funky characters?
«I noticed a few JSP pages printing funky non-ascii characters.»: the root cause of this problem lies entirely elsewhere. Google «mojibake».
5 Answers 5
Does replacing a character in a String with a null character even work in Java? I know that ‘\0’ will terminate a c-string.
That depends on how you define what is working. Does it replace all occurrences of the target character with ‘\0’ ? Absolutely!
String s = "food".replace('o', '\0'); System.out.println(s.indexOf('\0')); // "1" System.out.println(s.indexOf('d')); // "3" System.out.println(s.length()); // "4" System.out.println(s.hashCode() == 'f'*31*31*31 + 'd'); // "true"
Everything seems to work fine to me! indexOf can find it, it counts as part of the length, and its value for hash code calculation is 0; everything is as specified by the JLS/API.
It DOESN’T work if you expect replacing a character with the null character would somehow remove that character from the string. Of course it doesn’t work like that. A null character is still a character!
String s = Character.toString('\0'); System.out.println(s.length()); // "1" assert s.charAt(0) == 0;
It also DOESN’T work if you expect the null character to terminate a string. It’s evident from the snippets above, but it’s also clearly specified in JLS (10.9. An Array of Characters is Not a String):
In the Java programming language, unlike C, an array of char is not a String , and neither a String nor an array of char is terminated by ‘\u0000’ (the NUL character).
Would this be the culprit to the funky characters?
Now we’re talking about an entirely different thing, i.e. how the string is rendered on screen. Truth is, even «Hello world!» will look funky if you use dingbats font. A unicode string may look funky in one locale but not the other. Even a properly rendered unicode string containing, say, Chinese characters, may still look funky to someone from, say, Greenland.
That said, the null character probably will look funky regardless; usually it’s not a character that you want to display. That said, since null character is not the string terminator, Java is more than capable of handling it one way or another.
Now to address what we assume is the intended effect, i.e. remove all period from a string, the simplest solution is to use the replace(CharSequence, CharSequence) overload.
System.out.println("A.E.I.O.U".replace(".", "")); // AEIOU
The replaceAll solution is mentioned here too, but that works with regular expression, which is why you need to escape the dot meta character, and is likely to be slower.
null = «» for a string
@Gabe, Sybase treats null and an empty string as null. IBM DB2/UDB treats them as distinct values. Not certain if MS SQL does. I personally know of no programming language outside of various SQL implementations that do treat them as the same.
Nathan: MS SQL (before v7) treats the empty string as a single space. I’m pretty sure that’s how Sybase works too.
8 Answers 8
The empty string and null are different. The empty string is a string with no characters, null is not having a string at all.
You can call methods on an empty string but if you try to call a method on null you will get an exception.
public static void main(String[] args)
0 Exception in thread "main" java.lang.NullPointerException at Program.main(Program.java:12)
No, an empty string is not null.
They are most definitely not the same. Your String variable acts as a reference to an object in memory, and if it’s set to null, it’s not pointing to anything. If it’s set to the empty-string value, it’s pointing to that.
In my own coding, I generally set a String to «» instead of to null unless I have a special need for null. There are some libraries like Apache Commons that include helper classes like StringUtils that will collapse a check for null, the empty string, and even just whitespace into one call: StringUtils.isBlank(), StringUtils.isNotBlank(), etc. Pretty handy. Or you can write your own helper methods to do similar pretty easily.
Good luck as you progress in Java!
That method has only been available since Java 6 and, unlike the StringUtils methods, is not null safe.
Why does String.valueOf(null) throw a NullPointerException?
The issue is that String.valueOf method is overloaded:
Java Specification Language mandates that in these kind of cases, the most specific overload is chosen:
JLS 15.12.2.5 Choosing the Most Specific Method
If more than one member method is both accessible and applicable to a method invocation, it is necessary to choose one to provide the descriptor for the run-time method dispatch. The Java programming language uses the rule that the most specific method is chosen.
A char[] is-an Object , but not all Object is-a char[] . Therefore, char[] is more specific than Object , and as specified by the Java language, the String.valueOf(char[]) overload is chosen in this case.
String.valueOf(char[]) expects the array to be non- null , and since null is given in this case, it then throws NullPointerException .
The easy «fix» is to cast the null explicitly to Object as follows:
System.out.println(String.valueOf((Object) null)); // prints "null"
Related questions
Moral of the story
There are several important ones:
- Effective Java 2nd Edition, Item 41: Use overloading judiciously
- Just because you can overload, doesn’t mean you should every time
- They can cause confusion (especially if the methods do wildly different things)
- With Eclipse, you can mouse-hover on the above expression and see that indeed, the valueOf(char[]) overload is selected!
See also
On casting null
There are at least two situations where explicitly casting null to a specific reference type is necessary:
- To select overloading (as given in above example)
- To give null as a single argument to a vararg parameter
A simple example of the latter is the following:
static void vararg(Object. os)
Then, we can have the following:
vararg(null, null, null); // prints "3" vararg(null, null); // prints "2" vararg(null); // throws NullPointerException! vararg((Object) null); // prints "1"
See also
Related questions