How to convert byte array to string array in java

How to convert byte array to string and vice versa?

I have to convert a byte array to string in Android, but my byte array contains negative values. If I convert that string again to byte array, values I am getting are different from original byte array values. What can I do to get proper conversion? Code I am using to do the conversion is as follows:

// Code to convert byte arr to str: byte[] by_original = ; String str1 = new String(by_original); System.out.println("str1 >> "+str1); // Code to convert str to byte arr: byte[] by_new = str1.getBytes(); for(int i=0;i> "+str1); 

Why are you trying to convert arbitrary binary data to a String in the first place? Apart from all the charset problems the answers already mention, there’s also the fact that you’re abusing String if you do this. What’s wrong with using a byte[] for your binary data and String for your text?

@Joachim — sometimes you have external tools that can do things like store strings. You want to be able to turn a byte array into a (encoded in some way) string in that case.

25 Answers 25

Your byte array must have some encoding. The encoding cannot be ASCII if you’ve got negative values. Once you figure that out, you can convert a set of bytes to a String using:

byte[] bytes = String str = new String(bytes, StandardCharsets.UTF_8); // for UTF-8 encoding 

There are a bunch of encodings you can use, look at the supported encodings in the Oracle javadocs.

@UnKnown because UTF-8 encodes some characters as 2- or 3- byte strings. Not every byte array is a valid UTF-8-encoded string. ISO-8859-1 would be a better choise: here each character is encoded as a byte.

to map one byte to one char (with 8859-1) and no exception handling (with nio.charset): String str = new String(bytes, java.nio.charset.StandardCharsets.ISO_8859_1);

The «proper conversion» between byte[] and String is to explicitly state the encoding you want to use. If you start with a byte[] and it does not in fact contain text data, there is no «proper conversion». String s are for text, byte[] is for binary data, and the only really sensible thing to do is to avoid converting between them unless you absolutely have to.

If you really must use a String to hold binary data then the safest way is to use Base64 encoding.

The root problem is (I think) that you are unwittingly using a character set for which:

in some cases. UTF-8 is an example of such a character set. Specifically, certain sequences of bytes are not valid encodings in UTF-8. If the UTF-8 decoder encounters one of these sequences, it is liable to discard the offending bytes or decode them as the Unicode codepoint for «no such character». Naturally, when you then try to encode the characters as bytes the result will be different.

  1. Be explicit about the character encoding you are using; i.e. use a String constructor and String.toByteArray method with an explicit charset.
  2. Use the right character set for your byte data . or alternatively one (such as «Latin-1» where all byte sequences map to valid Unicode characters.
  3. If your bytes are (really) binary data and you want to be able to transmit / receive them over a «text based» channel, use something like Base64 encoding . which is designed for this purpose.

For Java, the most common character sets are in java.nio.charset.StandardCharsets . If you are encoding a string that can contain any Unicode character value then UTF-8 encoding ( UTF_8 ) is recommended.

If you want a 1:1 mapping in Java then you can use ISO Latin Alphabet No. 1 — more commonly just called «Latin 1» or simply «Latin» ( ISO_8859_1 ). Note that Latin-1 in Java is the IANA version of Latin-1 which assigns characters to all possible 256 values including control blocks C0 and C1. These are not printable: you won’t see them in any output.

From Java 8 onwards Java contains java.util.Base64 for Base64 encoding / decoding. For URL-safe encoding you may want to to use Base64.getUrlEncoder instead of the standard encoder. This class is also present in Android since Android Oreo (8), API level 26.

Источник

byte array to string

have small problem, and would very much appreciate help 🙂 I should convert byte array to string and get this output string: “[0, 0, 0, 0]” After that another method should take the string as input and retrieve the byte array from the first one. Im getting error that i have number.format exception, so i guess i should make convertToString method in some other way. This is what i have so far:

import java.io.ByteArrayOutputStream; import java.util.StringTokenizer; public class byteToString < public String convertToString()< byte[] byteArray = new byte[] ; String holder = new String(byteArray); return holder; > /*was told to use this code to convert back*/ private static byte[] toByteArray(String myString) < myString = myString.substring(0, myString.length()- 1).substring(1); ByteArrayOutputStream myStream = new ByteArrayOutputStream(); for (StringTokenizer myTok = new StringTokenizer(myString, ","); myTok.hasMoreTokens();)< myStream.write(Byte.parseByte(myTok.nextToken().trim())); >return myStream.toByteArray(); > public static void main(String[] args) < String myString = new byteToString().convertToString(); toByteArray(myString); >> 

3 Answers 3

new byte[] is actually [O, O, O, O] array of Ohs not zeroes!

Also want to note that you can use:

myString = myString.substring(1, myString.length() — 1);

myString = myString.substring(0, myString.length()- 1).substring(1); .

Источник

Converting a byte array to string without using new operator in Java

Is there a way to convert a byte array to string other than using new String(bytearray) ? The exact problem is I transmit a json-formatted string over the network through UDP connection. At the other end, I receive it in a fixed-size byte array(as I am not aware of the array size) and create a new string out of the byte array. If I do this, the whole memory that I allocated is being held unnecessary. To avoid this I get the byte array convert it to string, truncate the string till the last valid character and then convert it to a byte array and create a new string out of it. If I do this, it just uses up the required memory but the garbage collection frequency becomes so high as it involves more number of allocations. What is the best way to do this?

What is the destination? You might want to write it immediately to the destination instead of intermediating in a byte[] or a String .

4 Answers 4

String s = new String( bytearray, 0, lenOfValidData, "US-ASCII"); 

do what you want (change the charset to whatever encoding is appropriate)?

Based on your comments, you might want to try:

socket.receive(packet); String strPacket = new String( packet.getData(), 0, packet.getLength(), "US-ASCII"); receiver.onReceive( strPacket); 

I’m not familiar enough with Java’s datagram support to know if packet.getLength() returns the truncated length or the original length of the datagram (before truncation to fit in the receive buffer). It might be safer to create the string like so:

String strPacket = new String( packet.getData(), 0, Math.min( packet.getLength(), packet.getData().length), "US-ASCII"); 

Then again, it might be unnecessary.

The problem here is the length of the string in the packet varies. So I am not aware of the lenOfValidData here. Is there a way to do that without knowing that? And moreover using new String is causing lot of GCs as my string size is 8k-10k normally.

How do you determine the length of the valid data now (somehow after the 1st conversion to a string)? As far as avoiding new String — if you need the data in a string, that’ll have to happen at some point (even if it’s hidden inside of a method that returns a string object). What you should be able to do is avoid creation of intermediate String and/or byte array objects.

Also — it seems to me that whatever is filling in the byte array from the UDP packet should let you know how much data it put into the buffer/array. you should use that information as the lenOfValidData argument.

I know the last character of my String as it is json formatted. So I can use substring() method on String to get only the required characters.

These are the two snippets I tried. #1 socket.receive(packet); String strPacket = new String(packet.getData()); receiver.onReceive(strPacket.substring(0,strPacket.indexOf(‘>’)+1))); This one holds up unnecessary memory. #2 socket.receive(packet); String strPacket = new String(packet.getData()); receiver.onReceive(new String(strPacket.substring(0,strPacket.indexOf(‘>’)+1)).getBytes()); This one causes GC twice as frequently as the first one.

The simplest and most reliable way to do this is to use the length of the packet that you read from the UDP socket. The javadoc for DatagramSocket.receive(. ) says this:

Receives a datagram packet from this socket. When this method returns, the DatagramPacket’s buffer is filled with the data received. The datagram packet also contains the sender’s IP address, and the port number on the sender’s machine.

This method blocks until a datagram is received. The length field of the datagram packet object contains the length of the received message. If the message is longer than the packet’s length, the message is truncated.

If you cannot do that, then the following will allocate a minimum sized String with no unnecessary allocation of temporaries.

 byte[] buff = . // read from socket. // Find byte offset of first 'non-character' in buff int i; for (i = 0; i < buff.length && /* buff[i] represents a character */; i++) < /**/ >// Allocate String String res = new String(buff, 0, i, charsetName); 

Note that the criterion for determining a non-character is character set and application specific. But probably testing for a zero byte is sufficient.

What does the javadoc exactly mean by «The length of the new String is a function of the charset, and hence may not be equal to the length of the subarray.»

It is pointing to the fact that for some character encodings (for example UTF-8, UTF-16, JIS, etc) some characters are represented by two or more bytes. So for example, 10 bytes of UTF-8 might represent fewer than 10 characters.

Источник

Converting byte array to String (Java)

I’m writing a web application in Google app Engine. It allows people to basically edit html code that gets stored as an .html file in the blobstore. I’m using fetchData to return a byte[] of all the characters in the file. I’m trying to print to an html in order for the user to edit the html code. Everything works great! Here’s my only problem now: The byte array is having some issues when converting back to a string. Smart quotes and a couple of characters are coming out looking funky. (?’s or japanese symbols etc.) Specifically it’s several bytes I’m seeing that have negative values which are causing the problem. The smart quotes are coming back as -108 and -109 in the byte array. Why is this and how can I decode the negative bytes to show the correct character encoding?

Hi, I know it is a really old post but I am facing similar problems. I am making a man-in-the-middle proxy for ssl. The problem that I am facing is same as yours. I listen to the socket and get the data into InputStream and then into byte[] . Now when I am trying to convert the byte[] into String (I need to use the response body for attacks), I get really funny characters full of smart quotes and question marks and what not. I believe yours problem is same as mine as we both are dealing with html in byte[] . Can you please advice?

By the way, I went to the extent to find the encoding of my system using Sytem.properties and found it to be «Cp1252». Now, I used String str=new String(buffer, «Cp1252»); but no help.

Источник

Читайте также:  Get element value with javascript
Оцените статью