Java массив байт строку

Java Byte Array to String to Byte Array

I’m trying to understand a byte[] to string, string representation of byte[] to byte[] conversion. I convert my byte[] to a string to send, I then expect my web service (written in python) to echo the data straight back to the client. When I send the data from my Java application.

Arrays.toString(data.toByteArray()) 

Send (This is the result of Arrays.toString() which should be a string representation of my byte data, this data will be sent across the wire):

[-47, 1, 16, 84, 2, 101, 110, 83, 111, 109, 101, 32, 78, 70, 67, 32, 68, 97, 116, 97] 

On the python side, the python server returns a string to the caller (which I can see is the same as the string I sent to the server

[-47, 1, 16, 84, 2, 101, 110, 83, 111, 109, 101, 32, 78, 70, 67, 32, 68, 97, 116, 97] 

The server should return this data to the client, where it can be verified. The response my client receives (as a string) looks like

[-47, 1, 16, 84, 2, 101, 110, 83, 111, 109, 101, 32, 78, 70, 67, 32, 68, 97, 116, 97] 

I can’t seem to figure out how to get the received string back into a byte[] Whatever I seem to try I end up getting a byte array which looks as follows.

[91, 45, 52, 55, 44, 32, 49, 44, 32, 49, 54, 44, 32, 56, 52, 44, 32, 50, 44, 32, 49, 48, 49, 44, 32, 49, 49, 48, 44, 32, 56, 51, 44, 32, 49, 49, 49, 44, 32, 49, 48, 57, 44, 32, 49, 48, 49, 44, 32, 51, 50, 44, 32, 55, 56, 44, 32, 55, 48, 44, 32, 54, 55, 44, 32, 51, 50, 44, 32, 54, 56, 44, 32, 57, 55, 44, 32, 49, 49, 54, 44, 32, 57, 55, 93] 

Both of these are different from my sent data. I’m sure Im missing something truly simple. Any help?!

Читайте также:  Syntax error unexpected in home bitrix www restore php on line 91

Note that normally you would use a base 64 encoding ( 0QEQVAJlblNvbWUgTkZDIERhdGE= ) or hexadecimal encoding ( d101105402656e536f6d65204e46432044617461 ) of the bytes; not an array encoding with separators, warts and whatnot. So this Q/A is — or should not be — applicable for 99% of situations. And note that TCP and HTTP are fully capable of handling binary for larger amounts of data using POST .

12 Answers 12

You can’t just take the returned string and construct a string from it. it’s not a byte[] data type anymore, it’s already a string; you need to parse it. For example :

String response = "[-47, 1, 16, 84, 2, 101, 110, 83, 111, 109, 101, 32, 78, 70, 67, 32, 68, 97, 116, 97]"; // response from the Python script String[] byteValues = response.substring(1, response.length() - 1).split(","); byte[] bytes = new byte[byteValues.length]; for (int i=0, len=bytes.length; i String str = new String(bytes); 

You get an hint of your problem in your question, where you say » Whatever I seem to try I end up getting a byte array which looks as follows. [91, 45, . «, because 91 is the byte value for [ , so [91, 45, . is the byte array of the string » [-45, 1, 16, . » string.

The method Arrays.toString() will return a String representation of the specified array; meaning that the returned value will not be a array anymore. For example :

byte[] b1 = new byte[] ; String s1 = Arrays.toString(b1); String s2 = new String(b1); System.out.println(s1); // -> "[97, 98, 99]" System.out.println(s2); // -> "abc"; 

As you can see, s1 holds the string representation of the array b1 , while s2 holds the string representation of the bytes contained in b1 .

Now, in your problem, your server returns a string similar to s1 , therefore to get the array representation back, you need the opposite constructor method. If s2.getBytes() is the opposite of new String(b1) , you need to find the opposite of Arrays.toString(b1) , thus the code I pasted in the first snippet of this answer.

Источник

How to convert byte array to string and vice versa?

I have to convert a byte array to string in Android, but my byte array contains negative values. If I convert that string again to byte array, values I am getting are different from original byte array values. What can I do to get proper conversion? Code I am using to do the conversion is as follows:

// Code to convert byte arr to str: byte[] by_original = ; String str1 = new String(by_original); System.out.println("str1 >> "+str1); // Code to convert str to byte arr: byte[] by_new = str1.getBytes(); for(int i=0;i> "+str1); 

Why are you trying to convert arbitrary binary data to a String in the first place? Apart from all the charset problems the answers already mention, there’s also the fact that you’re abusing String if you do this. What’s wrong with using a byte[] for your binary data and String for your text?

@Joachim — sometimes you have external tools that can do things like store strings. You want to be able to turn a byte array into a (encoded in some way) string in that case.

25 Answers 25

Your byte array must have some encoding. The encoding cannot be ASCII if you’ve got negative values. Once you figure that out, you can convert a set of bytes to a String using:

byte[] bytes = String str = new String(bytes, StandardCharsets.UTF_8); // for UTF-8 encoding 

There are a bunch of encodings you can use, look at the supported encodings in the Oracle javadocs.

@UnKnown because UTF-8 encodes some characters as 2- or 3- byte strings. Not every byte array is a valid UTF-8-encoded string. ISO-8859-1 would be a better choise: here each character is encoded as a byte.

to map one byte to one char (with 8859-1) and no exception handling (with nio.charset): String str = new String(bytes, java.nio.charset.StandardCharsets.ISO_8859_1);

The «proper conversion» between byte[] and String is to explicitly state the encoding you want to use. If you start with a byte[] and it does not in fact contain text data, there is no «proper conversion». String s are for text, byte[] is for binary data, and the only really sensible thing to do is to avoid converting between them unless you absolutely have to.

If you really must use a String to hold binary data then the safest way is to use Base64 encoding.

The root problem is (I think) that you are unwittingly using a character set for which:

in some cases. UTF-8 is an example of such a character set. Specifically, certain sequences of bytes are not valid encodings in UTF-8. If the UTF-8 decoder encounters one of these sequences, it is liable to discard the offending bytes or decode them as the Unicode codepoint for «no such character». Naturally, when you then try to encode the characters as bytes the result will be different.

  1. Be explicit about the character encoding you are using; i.e. use a String constructor and String.toByteArray method with an explicit charset.
  2. Use the right character set for your byte data . or alternatively one (such as «Latin-1» where all byte sequences map to valid Unicode characters.
  3. If your bytes are (really) binary data and you want to be able to transmit / receive them over a «text based» channel, use something like Base64 encoding . which is designed for this purpose.

For Java, the most common character sets are in java.nio.charset.StandardCharsets . If you are encoding a string that can contain any Unicode character value then UTF-8 encoding ( UTF_8 ) is recommended.

If you want a 1:1 mapping in Java then you can use ISO Latin Alphabet No. 1 — more commonly just called «Latin 1» or simply «Latin» ( ISO_8859_1 ). Note that Latin-1 in Java is the IANA version of Latin-1 which assigns characters to all possible 256 values including control blocks C0 and C1. These are not printable: you won’t see them in any output.

From Java 8 onwards Java contains java.util.Base64 for Base64 encoding / decoding. For URL-safe encoding you may want to to use Base64.getUrlEncoder instead of the standard encoder. This class is also present in Android since Android Oreo (8), API level 26.

Источник

How to convert Strings to and from UTF8 byte arrays in Java

In Java, I have a String and I want to encode it as a byte array (in UTF8, or some other encoding). Alternately, I have a byte array (in some known encoding) and I want to convert it into a Java String. How do I do these conversions?

13 Answers 13

Convert from String to byte[] :

String s = "some text here"; byte[] b = s.getBytes(StandardCharsets.UTF_8); 

Convert from byte[] to String :

byte[] b = ; String s = new String(b, StandardCharsets.US_ASCII); 

You should, of course, use the correct encoding name. My examples used US-ASCII and UTF-8, two commonly-used encodings.

This method, however, will not report any problems in the conversion. This may be what you want. If not, it is recommended to use CharsetEncoder instead.

@Pacerier because the docs for Charset list «UTF-8» as one of the standard charsets. I believe that your spelling is also accepted, but I went with what the docs said.

Here’s a solution that avoids performing the Charset lookup for every conversion:

import java.nio.charset.Charset; private final Charset UTF8_CHARSET = Charset.forName("UTF-8"); String decodeUTF8(byte[] bytes) < return new String(bytes, UTF8_CHARSET); >byte[] encodeUTF8(String string)

@mcherm: Even if the performance difference is small, I prefer using objects (Charset, URL, etc) over their string forms when possible.

Regarding «avoids performing the Charset lookup for every conversion». please cite some source. Isn’t java.nio.charset.Charset built on top of String.getBytes and therefore has more overhead than String.getBytes?

The docs do state: «The behavior of this method when this string cannot be encoded in the given charset is unspecified. The CharsetEncoder class should be used when more control over the encoding process is required.»

Note: since Java 1.7, you can use StandardCharsets.UTF_8 for a constant way of accessing the UTF-8 charset.

String original = "hello world"; byte[] utf8Bytes = original.getBytes("UTF-8"); 

You can convert directly via the String(byte[], String) constructor and getBytes(String) method. Java exposes available character sets via the Charset class. The JDK documentation lists supported encodings.

90% of the time, such conversions are performed on streams, so you’d use the Reader/Writer classes. You would not incrementally decode using the String methods on arbitrary byte streams — you would leave yourself open to bugs involving multibyte characters.

Can you elaborate? If my application encodes and decodes Strings in UTF-8 , what’s the concern regarding multibytes characters?

@raffian Problems can occur if you don’t transform all the character data in one go. See here for an example.

My tomcat7 implementation is accepting strings as ISO-8859-1; despite the content-type of the HTTP request. The following solution worked for me when trying to correctly interpret characters like ‘é’ .

byte[] b1 = szP1.getBytes("ISO-8859-1"); System.out.println(b1.toString()); String szUT8 = new String(b1, "UTF-8"); System.out.println(szUT8); 

When trying to interpret the string as US-ASCII, the byte info wasn’t correctly interpreted.

b1 = szP1.getBytes("US-ASCII"); System.out.println(b1.toString()); 

FYI, as of Java 7 you can use constants for those charset names such as StandardCharSets.UTF_8 and StandardCharSets.ISO_8859_1 .

As an alternative, StringUtils from Apache Commons can be used.

 byte[] bytes = ; String convertedString = StringUtils.newStringUtf8(bytes); 
 String myString = "example"; byte[] convertedBytes = StringUtils.getBytesUtf8(myString); 

If you have non-standard charset, you can use getBytesUnchecked() or newString() accordingly.

Yes, bit of a gotcha! For Gradle, Maven users: «commons-codec:commons-codec:1.10» (at time of writing). This also comes bundled as a dependency with Apache POI, for example. Apart from that Apache Commons to the rescue, as ever!

I can’t comment but don’t want to start a new thread. But this isn’t working. A simple round trip:

byte[] b = new byte[]< 0, 0, 0, -127 >; // 0x00000081 String s = new String(b,StandardCharsets.UTF_8); // UTF8 = 0x0000, 0x0000, 0x0000, 0xfffd b = s.getBytes(StandardCharsets.UTF_8); // [0, 0, 0, -17, -65, -67] 0x000000efbfbd != 0x00000081 

I’d need b[] the same array before and after encoding which it isn’t (this referrers to the first answer).

For decoding a series of bytes to a normal string message I finally got it working with UTF-8 encoding with this code:

/* Convert a list of UTF-8 numbers to a normal String * Usefull for decoding a jms message that is delivered as a sequence of bytes instead of plain text */ public String convertUtf8NumbersToString(String[] numbers) < int length = numbers.length; byte[] data = new byte[length]; for(int i = 0; i< length; i++)< data[i] = Byte.parseByte(numbers[i]); >return new String(data, Charset.forName("UTF-8")); > 

If you are using 7-bit ASCII or ISO-8859-1 (an amazingly common format) then you don’t have to create a new java.lang.String at all. It’s much much more performant to simply cast the byte into char:

If you are not using extended-characters like Ä, Æ, Å, Ç, Ï, Ê and can be sure that the only transmitted values are of the first 128 Unicode characters, then this code will also work for UTF-8 and extended ASCII (like cp-1252).

Источник

Оцените статью