Bytes to string encoding java

Byte Encodings and Strings

If a byte array contains non-Unicode text, you can convert the text to Unicode with one of the String constructor methods. Conversely, you can convert a String object into a byte array of non-Unicode characters with the String.getBytes method. When invoking either of these methods, you specify the encoding identifier as one of the parameters.

The example that follows converts characters between UTF-8 and Unicode. UTF-8 is a transmission format for Unicode that is safe for UNIX file systems. The full source code for the example is in the file StringConverter.java .

The StringConverter program starts by creating a String containing Unicode characters:

String original = new String("A" + "\u00ea" + "\u00f1" + "\u00fc" + "C");

When printed, the String named original appears as:

To convert the String object to UTF-8, invoke the getBytes method and specify the appropriate encoding identifier as a parameter. The getBytes method returns an array of bytes in UTF-8 format. To create a String object from an array of non-Unicode bytes, invoke the String constructor with the encoding parameter. The code that makes these calls is enclosed in a try block, in case the specified encoding is unsupported:

Читайте также:  Проверить пуст ли файл python

try < byte[] utf8Bytes = original.getBytes("UTF8"); byte[] defaultBytes = original.getBytes(); String roundTrip = new String(utf8Bytes, "UTF8"); System.out.println("roundTrip = " + roundTrip); System.out.println(); printBytes(utf8Bytes, "utf8Bytes"); System.out.println(); printBytes(defaultBytes, "defaultBytes"); >catch (UnsupportedEncodingException e)

The StringConverter program prints out the values in the utf8Bytes and defaultBytes arrays to demonstrate an important point: The length of the converted text might not be the same as the length of the source text. Some Unicode characters translate into single bytes, others into pairs or triplets of bytes.

The printBytes method displays the byte arrays by invoking the byteToHex method, which is defined in the source file, UnicodeFormatter.java . Here is the printBytes method:

public static void printBytes(byte[] array, String name) < for (int k = 0; k < array.length; k++) < System.out.println(name + "[" + k + "] = " + "0x" + UnicodeFormatter.byteToHex(array[k])); >>

The output of the printBytes method follows. Note that only the first and last bytes, the A and C characters, are the same in both arrays:

utf8Bytes[0] = 0x41 utf8Bytes[1] = 0xc3 utf8Bytes[2] = 0xaa utf8Bytes[3] = 0xc3 utf8Bytes[4] = 0xb1 utf8Bytes[5] = 0xc3 utf8Bytes[6] = 0xbc utf8Bytes[7] = 0x43 defaultBytes[0] = 0x41 defaultBytes[1] = 0xea defaultBytes[2] = 0xf1 defaultBytes[3] = 0xfc defaultBytes[4] = 0x43

Источник

2 Examples to Convert Byte[] array to String in Java

Converting a byte array to String seems easy but what is difficult is, doing it correctly. Many programmers make mistake of ignoring character encoding whenever bytes are converted into a String or char or vice versa. As a programmer, we all know that computer’s only understand binary data i.e. 0 and 1. All things we see and use e.g. images, text files, movies, or any other multi-media is stored in form of bytes, but what is more important is process of encoding or decoding bytes to character. Data conversion is an important topic on any programming interview, and because of trickiness of character encoding, this questions is one of the most popular String Interview question on Java Interviews. While reading a String from input source e.g. XML files, HTTP request, network port, or database, you must pay attention on which character encoding (e.g. UTF-8, UTF-16, and ISO 8859-1) they are encoded. If you will not use the same character encoding while converting bytes to String, you would end up with a corrupt String which may contain totally incorrect values. You might have seen?, square brackets after converting byte[] to String, those are because of values your current character encoding is not supporting, and just showing some garbage values. I tried to understand why programmes make character encoding mistakes more often than not, and my little research and own experience suggests that, it may be because of two reasons, first not dealing enough with internationalization and character encodings and second because ASCII characters are supported by almost all popular encoding schemes and has same values. Since we mostly deal with encoding like UTF-8, Cp1252 and Windows-1252, which displays ASCII characters (mostly alphabets and numbers) without fail, even if you use different encoding scheme. Real issue comes when your text contains special characters e.g. ‘é’, which is often used in French names. If your platform’s character encoding doesn’t recognize that character then either you will see a different character or something garbage, and sadly until you got your hands burned, you are unlikely to be careful with character encoding. In Java, things are little bit more tricky because many IO classes e.g. InputStreamReader by default use platform’s character encoding. What this means is that, if you run your program in different machine, you will likely get different output because of different character encoding used on that machine. In this article, we will learn how to convert byte[] to String in Java both by using JDK API and with the help of Guava and Apache commons.

How to convert byte[] to String in Java

String str = new String(bytes, "UTF-8");
String fromStream = IOUtils.toString(fileInputStream, "UTF-8");

In order to correctly convert those byte array into String, you must first discover correct character encoding by reading meta data e.g. Content-Type, etc, depending on the format/protocol of the data you are reading. This is one of the reason I recommend to use XML parsers e.g. SAX or DOM parsers to read XML files, they take care of character encoding by themselves.

Some programmers, also recommends to use Charset over String for specifying character encoding, e.g. instead of “UTF-8” use StandardCharsets.UTF_8 mainly to avoid UnsupportedEncodingException in worst case. There are six standard Charset implementations guaranteed to be supported by all Java platform implementations. You can use them instead specifying encoding scheme in String. In short, always prefer StandardCharsets.ISO_8859_1 over “ISO_8859_1”, as shown below :

String str = IOUtils.toString(fis,StandardCharsets.UTF_8);
  1. StandardCharsets.ISO_8859_1
  2. StandardCharsets.US_ASCII
  3. StandardCharsets.UTF_16
  4. StandardCharsets.UTF_16BE
  5. StandardCharsets.UTF_16LE

If you are reading bytes from input stream, you can also check my earlier post about 5 ways to convert InputStream to String in Java for details.

Original XML

Here is our sample XML snippet to demonstrate issues with using default character encoding. This file contains letter ‘é’, which is not correctly displayed in Eclipse because it’s default character encoding is Cp1252.

xml version="1.0" encoding="UTF-8"?>  Industrial & Commercial Bank of China Beijing , China  Crédit Agricole SA Montrouge, France  Société Générale Paris, Île-de-France, France  

And, this is what happens when you convert a byte array to String without specify character encoding, e.g. :

String str = new String(filedata);

This will use platform’s default character encoding, which is Cp1252 in this case, because we are running this program in Eclipse IDE. You can see that letter ‘é’ is not displayed correctly.

xml version="1.0" encoding="UTF-8"?>  Industrial & Commercial Bank of China Beijing , China  Crédit Agricole SA Montrouge, France  Société Générale Paris, Île-de-France, France  

To fix this, specify character encoding while creating String from byte array, e.g.

String str = new String(filedata, "UTF-8");

By the way, let me make it clear that even though I have read XML files using InputStream here it’s not a good practice, in fact it’s a bad practice. You should always use proper XML parsers for reading XML documents. If you don’t know how, please check this tutorial. Since this example is mostly to show you why character encoding matters, I have chosen an example which was easily available and looks more practical.

Java Program to Convert Byte array to String in Java

Character Encoding, Converting Byte array to String in Java

Here is our sample program to show why relying on default character encoding is a bad idea and why you must use character encoding while converting byte array to String in Java. In this program, we are using Apache Commons IOUtils class to directly read file into byte array. It takes care of opening/closing input stream, so you don’t need to worry about leaking file descriptors. Now how you create String using that array, is the key. If you provide right character encoding, you will get correct output otherwise a nearly correct but incorrect output.

import java.io.FileInputStream; import java.io.IOException; import org.apache.commons.io.IOUtils; /** * Java Program to convert byte array to String. In this example, we have first * read an XML file with character encoding "UTF-8" into byte array and then created * String from that. When you don't specify a character encoding, Java uses * platform's default encoding, which may not be the same if file is a XML document coming from another system, emails, or plain text files fetched from an * HTTP server etc. You must first discover correct character encoding * and then use them while converting byte array to String. * * @author Javin Paul */ public class ByteArrayToString < public static void main(String args[]) throws IOException < System.out.println("Platform Encoding : " + System.getProperty("file.encoding")); FileInputStream fis = new FileInputStream("info.xml"); // Using Apache Commons IOUtils to read file into byte array byte[] filedata = IOUtils.toByteArray(fis); String str = new String(filedata, "UTF-8"); System.out.println(str); >> Output : Platform Encoding : Cp1252   Industrial & Commercial Bank of China Beijing , China  Crédit Agricole SA Montrouge, France  Société Générale Paris, Île-de-France, France  

Things to remember and Best Practices

  • Use character encoding from the source e.g. Content-Type in HTML files, or .
  • Use XML parsers to parse XML files instead of finding character encoding and reading it via InputStream, some things are best left for demo code only.
  • Prefer Charset constants e.g. StandardCharsets.UTF_16 instead of String “UTF-16”
  • Never rely on platform’s default encoding scheme

This rules should also be applied when you convert character data to byte e.g. converting String to byte array using String.getBytes() method. In this case it will use platform’s default character encoding, instead of this you should use overloaded version which takes character encoding.

That’s all on how to convert byte array to String in Java. As you can see that Java API, particularly java.lang.String class provides methods and constructor that takes a byte[] and returns a String (or vice versa), but by default they rely on platform’s character encoding, which may not be correct, if byte array is created from XML files, HTTP request data or from network protocols. You should always get right encoding from source itself. If you like to read more about what every programmer should know about String, you can checkout this article.

Источник

Converting byte[] Array to String in Java

This tutorial will teach you how to convert an array of bytes to a string of characters in Java.

Converting String to byte[]

Let’s assume we have a simple line of text converted to an array of bytes.

byte[] helloWorldBytes = "Hello world".getBytes(StandardCharsets.UTF_8);

Now that we have an array of bytes, we would like to convert it back to a String.

Converting byte[] to String

String helloWorldString = new String(helloWorldBytes, StandardCharsets.UTF_8);

Below is a complete code example that demonstrates:

  1. How to convert a plain text to an array of bytes in Java. And then,
  2. How to convert a byte[] array back to a String value.
import java.nio.charset.StandardCharsets; public class App < public static void main(String[] args) < byte[] helloWorldBytes = "Hello world".getBytes(StandardCharsets.UTF_8); String helloWorldString = new String(helloWorldBytes, StandardCharsets.UTF_8); System.out.println(helloWorldString); >>

Converting byte[] to a Base64 Encoded String

Sometimes we need to convert the byte[] array to a Base64 encoded String. This is helpful when you need to send an array of bytes over the network.

To convert an array of bytes to a Base64 encoded String, you will use the Base64 java class from the java.util package.

String base64EncodedString = Base64.getEncoder().encodeToString(helloWorldBytes);

Below is a complete code example demonstrating how to encode an array of bytes to a Base64 encoded String in Java.

import java.nio.charset.StandardCharsets; import java.util.Base64; public class App < public static void main(String[] args) < byte[] helloWorldBytes = "Hello world".getBytes(StandardCharsets.UTF_8); String base64EncodedString = Base64.getEncoder().encodeToString(helloWorldBytes); System.out.println(base64EncodedString); >>

Decoding Base64 String to byte[]

A Base64 encoded string can be easily decoded back to a byte[] array.

byte[] bytesFromBase64String = Base64.getDecoder().decode(base64EncodedString);

I hope this very short blog post was helpful to you.

There are many other useful tutorials you can find on this site. To find Java-related tutorials, check out the Java tutorials page.

Источник

Оцените статью