Java string to bytes size

Содержание

Tech Tutorials
Sunday, November 14, 2021
Convert String to Byte Array Java Program
Converting String to byte[] in Java
Conversion of string to byte array with encoding
Converting byte array to String in Java
byte size of a string?
конвертировать строку в байтовый массив и инвертировать в Java
Преобразовать строку в байтовый массив и инвертировать в Java
1. Вступление
2. ПреобразованиеString в массивByte
2.1. ИспользуяString.getBytes()
2.2. ИспользуяCharset.encode()
2.3. CharsetEncoderс
3. Преобразование байтового массива в строку
3.1. Использование конструктораString
3.2. ИспользуяCharset.decode()
3.3. CharsetDecoderс
4. Заключение

Tech Tutorials

Tutorials and posts about Java, Spring, Hadoop and many more. Java code examples and interview questions. Spring code examples.

Sunday, November 14, 2021

Convert String to Byte Array Java Program

In this post we’ll see a Java program to convert a String to byte array and byte array to String in Java.

Converting String to byte[] in Java

String class has getBytes() method which can be used to convert String to byte array in Java.

getBytes()— Encodes this String into a sequence of bytes using the platform’s default charset, storing the result into a new byte array.

import java.util.Arrays; public class StringToByte < public static void main(String[] args) < String str = "Example String"; byte[] b = str.getBytes(); System.out.println("Array " + b); System.out.println("Array as String" + Arrays.toString(b)); >>

Array [B@2a139a55 Array as String[69, 120, 97, 109, 112, 108, 101, 32, 83, 116, 114, 105, 110, 103]

As you can see here printing the byte array gives the memory address so used Arrays.toString in order to print the array values.

Conversion of string to byte array with encoding

Suppose you want to use “UTF-8” encoding then it can be done in 3 ways.

String str = "Example String"; byte[] b; try < b = str.getBytes("UTF-8"); >catch (UnsupportedEncodingException e) < // TODO Auto-generated catch block e.printStackTrace(); >b = str.getBytes(Charset.forName("UTF-8")); b = str.getBytes(StandardCharsets.UTF_8);

Using str.getBytes(«UTF-8») method to convert String to byte array will require to enclose it in a try-catch block as UnsupportedEncodingException is thrown. To avoid that you can use str.getBytes(Charset.forName(«UTF-8»)) method. Java 7 and above you can also use str.getBytes(StandardCharsets.UTF_8);

Converting byte array to String in Java

String class has a constructor which takes byte array as an argument. Using that you can get the String from a byte array.

String(byte[] bytes)— Constructs a new String by decoding the specified array of bytes using the platform’s default charset.

If you want to provide a specific encoding then you can use the following constructor-

String(byte[] bytes, Charset charset)— Constructs a new String by decoding the specified array of bytes using the specified charset.

public class StringToByte < public static void main(String[] args) < String str = "Example String"; // converting to byte array byte[] b = str.getBytes(); // Getting the string from a byte array String s = new String (b); System.out.println("String - " + s); >>

That’s all for this topic Convert String to Byte Array Java Program. If you have any doubt or any suggestions to make please drop a comment. Thanks!

Источник

byte size of a string?

posted 18 years ago

Would you please let me know:

1-How the byte size of a string can be determined?
For instsance how much is the byte size of s in the following statement:
String s = new String»ABC»;

2-How could I create a String with a specific size?(for instance a String with th esize of 100 bytes)

Best Regards,
Pourang Emami

author & internet detective

posted 18 years ago

Pourang,
1) s.length() will give you the number of bytes. Since characters are one byte (at least in ASCII), the number of characters is the same as the number of bytes. Another way is to get the bytes themselves and count them s.getBytes().length.

2) As Strings are immutable, you would need to have a 100 byte array with data on hand at string creation time and pass it to the constructor.

posted 18 years ago

Characters are 2 bytes in Java (not 1).
This does not give you the ability to measure the memory consumption of a String instance. This is impossible to achieve, since it is undefined.

Tony Morris
Java Q&A (FAQ, Trivia)

posted 18 years ago

As Jeanne and Tony’s answers make clear, it depends on what you want to do with the String.

In Java, the primitive char type is 2 bytes as they are Unicode values. As String uses a char[] to hold the characters, in memory the characters themselves take 2 * s.length() bytes. This doesn’t count the overhead of the array itself or the String. This is just the block of bytes for the characters.

If you want to write a String to a file or send it over a Socket, you need to first convert it to bytes. By default, Java uses a modified form of UTF-8 to encode and decode char[] to byte[]. For standard ASCII characters (everything in my post), it maps the lower byte of the char directly to a byte.

For example (\u. is a Unicode char constant and 0x?? is a byte constant, both in hexadecimal — base 16 rather than base 10):The short of it is, if you want to create a String that will encode into 100 bytes, create a String of 100 ASCII characters.

Of course, if your goal is to send/write byte arrays, why not just create one and skip the String entirely?
[ January 31, 2005: Message edited by: David Harkness ]

Источник

конвертировать строку в байтовый массив и инвертировать в Java

Преобразовать строку в байтовый массив и инвертировать в Java

1. Вступление

Нам часто требуется преобразование между массивомString иbyte в Java. В этом руководстве мы подробно рассмотрим эти операции.

Сначала мы рассмотрим различные способы преобразования массиваString в массивbyte. Затем мы посмотрим на аналогичные операции в обратном порядке.

2. ПреобразованиеString в массивByte

String хранится в Java как массив символов Юникода. Чтобы преобразовать его в массивbyte, мы переводим последовательность символов в последовательность байтов. Для этого переводаwe use an instance of Charset. This class specifies a mapping between a sequence of chars and a sequence of bytes.

Мы называем описанный выше процессencoding.

Мы можем закодироватьString в массивbyte в Java несколькими способами. Давайте подробно рассмотрим каждый из них с примерами.

2.1. ИспользуяString.getBytes()

The String class provides three overloaded getBytes methods to encode a String into a byte array:

getBytes() — кодируется с использованием кодировки платформы по умолчанию
getBytes (String charsetName) — кодирует с использованием названной кодировки
getBytes (Charset charset) — кодируется с использованием предоставленной кодировки

Во-первых,let’s encode a string using the platform’s default charset:

String inputString = "Hello World!"; byte[] byteArrray = inputString.getBytes();

Вышеупомянутый метод зависит от платформы, поскольку он использует кодировку платформы по умолчанию. Мы можем получить эту кодировку, вызвавCharset.defaultCharset().

Во-вторых,let’s encode a string using a named charset:

@Test public void whenGetBytesWithNamedCharset_thenOK() throws UnsupportedEncodingException < String inputString = "Hello World!"; String charsetName = "IBM01140"; byte[] byteArrray = inputString.getBytes("IBM01140"); assertArrayEquals( new byte[] < -56, -123, -109, -109, -106, 64, -26, -106, -103, -109, -124, 90 >, byteArrray); >

Этот метод выдаетUnsupportedEncodingException, если указанная кодировка не поддерживается.

Поведение двух вышеупомянутых версий не определено, если ввод содержит символы, которые не поддерживаются кодировкой. Напротив, третья версия использует массив байтов замены по умолчанию для кодирования неподдерживаемого ввода.

Далееlet’s call the third version of the getBytes() method and pass an instance of Charset:

@Test public void whenGetBytesWithCharset_thenOK() < String inputString = "Hello ਸੰਸਾਰ!"; Charset charset = Charset.forName("ASCII"); byte[] byteArrray = inputString.getBytes(charset); assertArrayEquals( new byte[] < 72, 101, 108, 108, 111, 32, 63, 63, 63, 63, 63, 33 >, byteArrray); >

Здесь мы используем фабричный методCharset.forName, чтобы получить экземплярCharset. Этот метод генерирует исключение времени выполнения, если имя запрошенной кодировки неверно. Он также генерирует исключение времени выполнения, если кодировка поддерживается в текущей JVM.

Однако некоторые кодировки гарантированно будут доступны на каждой платформе Java. КлассStandardCharsets определяет константы для этих кодировок.

Наконец,let’s encode using one of the standard charsets:

@Test public void whenGetBytesWithStandardCharset_thenOK() < String inputString = "Hello World!"; Charset charset = StandardCharsets.UTF_16; byte[] byteArrray = inputString.getBytes(charset); assertArrayEquals( new byte[] < -2, -1, 0, 72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 32, 0, 87, 0, 111, 0, 114, 0, 108, 0, 100, 0, 33 >, byteArrray); >

На этом мы завершаем обзор различных версийgetBytes. Затем давайте рассмотрим метод, предоставляемый самимCharset.

2.2. ИспользуяCharset.encode()

The Charset class provides encode(), a convenient method that encodes Unicode characters into bytes. Этот метод всегда заменяет недопустимые входные и неотображаемые символы, используя массив байтов замены по умолчанию для кодировки.

Давайте воспользуемся методомencode для преобразованияString в массивbyte:

@Test public void whenEncodeWithCharset_thenOK() < String inputString = "Hello ਸੰਸਾਰ!"; Charset charset = StandardCharsets.US_ASCII; byte[] byteArrray = charset.encode(inputString).array(); assertArrayEquals( new byte[] < 72, 101, 108, 108, 111, 32, 63, 63, 63, 63, 63, 33 >, byteArrray); >

Как мы видим выше, неподдерживаемые символы были заменены заменой кодировки по умолчаниюbyte 63.

Подходы, которые использовались до сих пор, используют классCharsetEncoder внутри для выполнения кодирования. Давайте рассмотрим этот класс в следующем разделе.

2.3. CharsetEncoderс

CharsetEncoder transforms Unicode characters into a sequence of bytes for a given charset. Moreover, it provides fine-grained control over the encoding process.

Давайте воспользуемся этим классом для преобразованияString в массивbyte:

@Test public void whenUsingCharsetEncoder_thenOK() throws CharacterCodingException < String inputString = "Hello ਸੰਸਾਰ!"; CharsetEncoder encoder = StandardCharsets.US_ASCII.newEncoder(); encoder.onMalformedInput(CodingErrorAction.IGNORE) .onUnmappableCharacter(CodingErrorAction.REPLACE) .replaceWith(new byte[] < 0 >); byte[] byteArrray = encoder.encode(CharBuffer.wrap(inputString)) .array(); assertArrayEquals( new byte[] < 72, 101, 108, 108, 111, 32, 0, 0, 0, 0, 0, 33 >, byteArrray); >

Здесь мы создаем экземплярCharsetEncoder, вызывая методnewEncoder для объектаCharset.

Затем мы указываем действия для условий ошибки, вызывая методыonMalformedInput() иonUnmappableCharacter() . . Мы можем указать следующие действия:

IGNORE — отбросить ошибочный ввод
ЗАМЕНИТЬ — заменить ошибочный ввод
ОТЧЕТ — сообщить об ошибке, вернув объектCoderResult или бросивCharacterCodingException

Кроме того, мы используем методreplaceWith(), чтобы указать заменяющий массивbyte.

Таким образом, мы завершим обзор различных подходов для преобразования строки в байтовый массив. Давайте теперь посмотрим на обратную операцию.

3. Преобразование байтового массива в строку

We refer to the process of converting a byte array to a String as decoding. Подобно кодированию, для этого процесса требуетсяCharset.

Однако мы не можем просто использовать любой набор символов для декодирования байтового массива. We should use the charset that was used to encode the String into the byte array.

Мы можем преобразовать байтовый массив в строку многими способами. Разберем каждую из них подробно.

3.1. Использование конструктораString

The String class has few constructors which take a byte array as input. Все они похожи на методgetBytes, но работают наоборот.

Во-первых,let’s convert a byte array to String using the platform’s default charset:

@Test public void whenStringConstructorWithDefaultCharset_thenOK() < byte[] byteArrray = < 72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33 >; String string = new String(byteArrray); assertNotNull(string); >

Обратите внимание, что здесь мы ничего не утверждаем о содержимом декодированной строки. Это связано с тем, что он может декодироваться по-другому в зависимости от кодировки платформы по умолчанию.

По этой причине мы должны вообще избегать этого метода.

Во-вторых,let’s use a named charset for decoding:

@Test public void whenStringConstructorWithNamedCharset_thenOK() throws UnsupportedEncodingException < String charsetName = "IBM01140"; byte[] byteArrray = < -56, -123, -109, -109, -106, 64, -26, -106, -103, -109, -124, 90 >; String string = new String(byteArrray, charsetName); assertEquals("Hello World!", string); >

Этот метод вызывает исключение, если именованная кодировка недоступна в JVM.

В-третьих,let’s use a Charset object to do decoding:

@Test public void whenStringConstructorWithCharSet_thenOK() < Charset charset = Charset.forName("UTF-8"); byte[] byteArrray = < 72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33 >; String string = new String(byteArrray, charset); assertEquals("Hello World!", string); >

Наконец,let’s use a standard Charset for the same:

@Test public void whenStringConstructorWithStandardCharSet_thenOK() < Charset charset = StandardCharsets.UTF_16; byte[] byteArrray = < -2, -1, 0, 72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 32, 0, 87, 0, 111, 0, 114, 0, 108, 0, 100, 0, 33 >; String string = new String(byteArrray, charset); assertEquals("Hello World!", string); >

Пока что мы преобразовали массивbyte вString с помощью конструктора. Давайте теперь рассмотрим другие подходы.

3.2. ИспользуяCharset.decode()

КлассCharset предоставляет методdecode(), который преобразуетByteBuffer вString:

@Test public void whenDecodeWithCharset_thenOK() < byte[] byteArrray = < 72, 101, 108, 108, 111, 32, -10, 111, 114, 108, -63, 33 >; Charset charset = StandardCharsets.US_ASCII; String string = charset.decode(ByteBuffer.wrap(byteArrray)) .toString(); assertEquals("Hello �orl�!", string); >

Здесьthe invalid input is replaced with the default replacement character for the charset.

3.3. CharsetDecoderс

Все предыдущие подходы для внутреннего декодирования используют классCharsetDecoder. We can use this class directly for fine-grained control on the decoding process:

@Test public void whenUsingCharsetDecoder_thenOK() throws CharacterCodingException < byte[] byteArrray = < 72, 101, 108, 108, 111, 32, -10, 111, 114, 108, -63, 33 >; CharsetDecoder decoder = StandardCharsets.US_ASCII.newDecoder(); decoder.onMalformedInput(CodingErrorAction.REPLACE) .onUnmappableCharacter(CodingErrorAction.REPLACE) .replaceWith("?"); String string = decoder.decode(ByteBuffer.wrap(byteArrray)) .toString(); assertEquals("Hello ?orl?!", string); >

Здесь мы заменяем недопустимые входные данные и неподдерживаемые символы на «?».

Если мы хотим получать информацию в случае неверных входных данных, мы можем изменитьdecoder как:

decoder.onMalformedInput(CodingErrorAction.REPORT) .onUnmappableCharacter(CodingErrorAction.REPORT)

4. Заключение

В этой статье мы исследовали несколько способов преобразованияString в байтовый массив и обратного преобразования. Мы должны выбрать подходящий метод на основе входных данных, а также уровня контроля, требуемого для неверных входных данных.

Как обычно, полный исходный код можно найтиover on GitHub.

Источник