Byte to character in java

How to convert a byte array to a string in Java

In this article, you’ll learn how to convert a byte[] array to a string in Java. We will also look at different ways to convert a string into a byte[] array. Conversion between byte array and string is one of the most common tasks in Java while reading files, generating crypto hashes, etc.

The simplest way to convert a byte array to a string is to use the String class constructor with byte[] as an argument:

// create a byte array (demo purpose only) byte[] bytes = "Hey, there!".getBytes(); // convert byte array to string String str = new String(bytes); // print string System.out.println(str); 

By default, new String() uses the platform default character encoding to convert the byte array to a string. If the character encoding is different, you can specify it by passing another argument to new String() as shown below:

String str = new String(bytes, StandardCharsets.UTF_8); 

Since Java 8, we have Base64 class available that provides static methods for obtaining encoders and decoders for the Base64 encoding scheme. You can also use this class to encode byte array into a string as shown below:

// create a byte array (demo purpose only) byte[] bytes = "Hey, there!".getBytes(); // convert byte array to string String str = Base64.getEncoder().encodeToString(bytes); // print string System.out.println(str); 

You can use the String.getBytes() method to convert a string to a byte array. This method uses the default character encoding to encode this string into a sequence of bytes. Here is an example:

// create a string (demo purpose only) String str = "Hey, there!"; // convert string to byte array byte[] bytes = str.getBytes(); 
byte[] bytes = str.getBytes(StandardCharsets.UTF_8); 
// create a string (demo purpose only) String str = "Hey, there!"; // convert string to byte array byte[] bytes = Base64.getDecoder().decode(str); 

You might also like.

Источник

Читайте также:  pageTitle); ?>

2 Examples to Convert Byte[] array to String in Java

Converting a byte array to String seems easy but what is difficult is, doing it correctly. Many programmers make mistake of ignoring character encoding whenever bytes are converted into a String or char or vice versa. As a programmer, we all know that computer’s only understand binary data i.e. 0 and 1. All things we see and use e.g. images, text files, movies, or any other multi-media is stored in form of bytes, but what is more important is process of encoding or decoding bytes to character. Data conversion is an important topic on any programming interview, and because of trickiness of character encoding, this questions is one of the most popular String Interview question on Java Interviews. While reading a String from input source e.g. XML files, HTTP request, network port, or database, you must pay attention on which character encoding (e.g. UTF-8, UTF-16, and ISO 8859-1) they are encoded. If you will not use the same character encoding while converting bytes to String, you would end up with a corrupt String which may contain totally incorrect values. You might have seen?, square brackets after converting byte[] to String, those are because of values your current character encoding is not supporting, and just showing some garbage values. I tried to understand why programmes make character encoding mistakes more often than not, and my little research and own experience suggests that, it may be because of two reasons, first not dealing enough with internationalization and character encodings and second because ASCII characters are supported by almost all popular encoding schemes and has same values. Since we mostly deal with encoding like UTF-8, Cp1252 and Windows-1252, which displays ASCII characters (mostly alphabets and numbers) without fail, even if you use different encoding scheme. Real issue comes when your text contains special characters e.g. ‘é’, which is often used in French names. If your platform’s character encoding doesn’t recognize that character then either you will see a different character or something garbage, and sadly until you got your hands burned, you are unlikely to be careful with character encoding. In Java, things are little bit more tricky because many IO classes e.g. InputStreamReader by default use platform’s character encoding. What this means is that, if you run your program in different machine, you will likely get different output because of different character encoding used on that machine. In this article, we will learn how to convert byte[] to String in Java both by using JDK API and with the help of Guava and Apache commons.

Читайте также:  Число кратно числу java

How to convert byte[] to String in Java

String str = new String(bytes, "UTF-8");
String fromStream = IOUtils.toString(fileInputStream, "UTF-8");

In order to correctly convert those byte array into String, you must first discover correct character encoding by reading meta data e.g. Content-Type, etc, depending on the format/protocol of the data you are reading. This is one of the reason I recommend to use XML parsers e.g. SAX or DOM parsers to read XML files, they take care of character encoding by themselves.

Some programmers, also recommends to use Charset over String for specifying character encoding, e.g. instead of “UTF-8” use StandardCharsets.UTF_8 mainly to avoid UnsupportedEncodingException in worst case. There are six standard Charset implementations guaranteed to be supported by all Java platform implementations. You can use them instead specifying encoding scheme in String. In short, always prefer StandardCharsets.ISO_8859_1 over “ISO_8859_1”, as shown below :

String str = IOUtils.toString(fis,StandardCharsets.UTF_8);
  1. StandardCharsets.ISO_8859_1
  2. StandardCharsets.US_ASCII
  3. StandardCharsets.UTF_16
  4. StandardCharsets.UTF_16BE
  5. StandardCharsets.UTF_16LE

If you are reading bytes from input stream, you can also check my earlier post about 5 ways to convert InputStream to String in Java for details.

Original XML

Here is our sample XML snippet to demonstrate issues with using default character encoding. This file contains letter ‘é’, which is not correctly displayed in Eclipse because it’s default character encoding is Cp1252.

xml version="1.0" encoding="UTF-8"?>  Industrial & Commercial Bank of China Beijing , China  Crédit Agricole SA Montrouge, France  Société Générale Paris, Île-de-France, France  

And, this is what happens when you convert a byte array to String without specify character encoding, e.g. :

String str = new String(filedata);

This will use platform’s default character encoding, which is Cp1252 in this case, because we are running this program in Eclipse IDE. You can see that letter ‘é’ is not displayed correctly.

xml version="1.0" encoding="UTF-8"?>  Industrial & Commercial Bank of China Beijing , China  Crédit Agricole SA Montrouge, France  Société Générale Paris, Île-de-France, France  

To fix this, specify character encoding while creating String from byte array, e.g.

String str = new String(filedata, "UTF-8");

By the way, let me make it clear that even though I have read XML files using InputStream here it’s not a good practice, in fact it’s a bad practice. You should always use proper XML parsers for reading XML documents. If you don’t know how, please check this tutorial. Since this example is mostly to show you why character encoding matters, I have chosen an example which was easily available and looks more practical.

Java Program to Convert Byte array to String in Java

Character Encoding, Converting Byte array to String in Java

Here is our sample program to show why relying on default character encoding is a bad idea and why you must use character encoding while converting byte array to String in Java. In this program, we are using Apache Commons IOUtils class to directly read file into byte array. It takes care of opening/closing input stream, so you don’t need to worry about leaking file descriptors. Now how you create String using that array, is the key. If you provide right character encoding, you will get correct output otherwise a nearly correct but incorrect output.

import java.io.FileInputStream; import java.io.IOException; import org.apache.commons.io.IOUtils; /** * Java Program to convert byte array to String. In this example, we have first * read an XML file with character encoding "UTF-8" into byte array and then created * String from that. When you don't specify a character encoding, Java uses * platform's default encoding, which may not be the same if file is a XML document coming from another system, emails, or plain text files fetched from an * HTTP server etc. You must first discover correct character encoding * and then use them while converting byte array to String. * * @author Javin Paul */ public class ByteArrayToString < public static void main(String args[]) throws IOException < System.out.println("Platform Encoding : " + System.getProperty("file.encoding")); FileInputStream fis = new FileInputStream("info.xml"); // Using Apache Commons IOUtils to read file into byte array byte[] filedata = IOUtils.toByteArray(fis); String str = new String(filedata, "UTF-8"); System.out.println(str); >> Output : Platform Encoding : Cp1252   Industrial & Commercial Bank of China Beijing , China  Crédit Agricole SA Montrouge, France  Société Générale Paris, Île-de-France, France  

Things to remember and Best Practices

  • Use character encoding from the source e.g. Content-Type in HTML files, or .
  • Use XML parsers to parse XML files instead of finding character encoding and reading it via InputStream, some things are best left for demo code only.
  • Prefer Charset constants e.g. StandardCharsets.UTF_16 instead of String “UTF-16”
  • Never rely on platform’s default encoding scheme

This rules should also be applied when you convert character data to byte e.g. converting String to byte array using String.getBytes() method. In this case it will use platform’s default character encoding, instead of this you should use overloaded version which takes character encoding.

That’s all on how to convert byte array to String in Java. As you can see that Java API, particularly java.lang.String class provides methods and constructor that takes a byte[] and returns a String (or vice versa), but by default they rely on platform’s character encoding, which may not be correct, if byte array is created from XML files, HTTP request data or from network protocols. You should always get right encoding from source itself. If you like to read more about what every programmer should know about String, you can checkout this article.

Источник

Оцените статью