How to split a byte array that contains multiple «lines» in Java?
I know byte arrays don’t have a concept of «lines», that’s something a Scanner (for example) could have. What you need is to read the file as lines and manipulate them.
How to split a byte array that contains multiple «lines» in Java?
Say we have a file like so:
(but this file got encrypted)
My crypto method returns the whole file in memory, as a byte[] type.
I know byte arrays don’t have a concept of «lines», that’s something a Scanner (for example) could have.
I would like to traverse each line, convert it to string and perform my operation on it but I don’t know how to:
- Find lines in a byte array
- Slice the original byte array to «lines» (I would convert those slices to String, to send to my other methods)
- Correctly traverse a byte array, where each iteration is a new «line»
Also: do I need to consider the different OS the file might have been composed in? I know that there is some difference between new lines in Windows and Linux and I don’t want my method to work only with one format.
Edit: Following some tips from answers here, I was able to write some code that gets the job done. I still wonder if this code is worthy of keeping or I am doing something that can fail in the future:
byte[] decryptedBytes = doMyCrypto(fileName, accessKey); ByteArrayInputStream byteArrInStrm = new ByteArrayInputStream(decryptedBytes); InputStreamReader inStrmReader = new InputStreamReader(byteArrInStrm); BufferedReader buffReader = new BufferedReader(inStrmReader); String delimRegex = ","; String line; String[] values = null; while ((line = buffReader.readLine()) != null) < values = line.split(delimRegex); if (Objects.equals(values[0], tableKey)) < return values; >> System.out.println(String.format("No entry with key %s in %s", tableKey, fileName)); return values;
In particular, I was advised to explicitly set the encoding but I was unable to see exactly where?
If you want to stream this, I’d suggest:
- Create a ByteArrayInputStream to wrap your array
- Wrap that in an InputStreamReader to convert binary data to text — I suggest you explicitly specify the text encoding being used
- Create a BufferedReader around that to read a line at a time
String line; while ((line = bufferedReader.readLine()) != null) < // Do something with the line >
BufferedReader handles line breaks from all operating systems.
byte[] data = . ; ByteArrayInputStream stream = new ByteArrayInputStream(data); InputStreamReader streamReader = new InputStreamReader(stream, StandardCharsets.UTF_8); BufferedReader bufferedReader = new BufferedReader(streamReader); String line; while ((line = bufferedReader.readLine()) != null)
Note that in general you’d want to use try-with-resources blocks for the streams and readers — but it doesn’t matter in this case, because it’s just in memory.
As Scott states i would like to see what you came up with so we can help you alter it to fit your needs.
Regarding your last comment about the OS; if you want to support multiple file types you should consider making several functions that support those different file extensions. As far as i know you do need to specify which file and what type of file you are reading with your code.
Java — How to find identical byte[]-objects in two arrays, 0. A simpler approach would be to just split the first array into N equal (or near-equal) parts (with 8 cores, n=8 seems reasonable). Then solve the program in the «normal» way, by looking if any hashes in the 2nd array are present in the N smaller sub-first-arrays. This can be done in parallel.
Split a byte array at a delimiter
I’m having a bit of an issue and, the other questions here didn’t help me much.
I am a security student and I am trying to write a crypter for a project. For those who don’t know what it is you can read up on here. http://www.gamekiller.net/tutorials-guides/17187-tut-making-crypter-vb6-using-rc4.html
Anyways, a quick explanation, crypters are programs meant to bypass antiviruses by encrypting a program and then affixing a «stub» which is a program that decrypts it, on the front. I’m having a very annoying issue with splitting my file up.
The big annoyance is that I have to put the crypted executable into a byte array, since strings kill certain characters in my crypted executable, making it unexecutable. To make matters worse I still have to «split» the exe and, this is where the trouble begins.
The basic idea of the stub is to:
- Get the current exe path
- Read all the bytes through File.ReadAllytes
- Split the file at the delimiter, «EVILDELIMITER»
- Get the last field (Since thats the crypted EXE)
- Decrypt it using RC4
- Run using RunPE.
I have everything working except the splitting part which, is the most annoying. How do I split a byte array at the delimiter? Is there an easier way to do this?
Here’s the code for the stub I have so far.
public void main() < string outpath = RandomString(8) + ".exe"; byte[] key = < 33, 44, 55, 66, 77 >; string apppath = Assembly.GetEntryAssembly(); byte[] exe = File.ReadAllBytes(apppath); string strseperate = "EVILSEPERATOREVIL"; System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding(); byte[] seperator = encoding.GetBytes(strseperate); //Split code should go here exe = Decrypt(key, encrypted); Process.Start(outpath); >
byte[] SeparateAndGetLast(byte[] source, byte[] separator) < for (var i = 0; i < source.Length; ++i) < if(Equals(source, separator, i)) < var index = i + separator.Length; var part = new byte[source.Length - index]; Array.Copy(source, index, part, 0, part.Length); return part; >> throw new Exception("not found"); > public static byte[][] Separate(byte[] source, byte[] separator) < var Parts = new List(); var Index = 0; byte[] Part; for (var I = 0; I < source.Length; ++I) < if (Equals(source, separator, I)) < Part = new byte[I - Index]; Array.Copy(source, Index, Part, 0, Part.Length); Parts.Add(Part); Index = I + separator.Length; I += separator.Length - 1; >> Part = new byte[source.Length - Index]; Array.Copy(source, Index, Part, 0, Part.Length); Parts.Add(Part); return Parts.ToArray(); > bool Equals(byte[] source, byte[] separator, int index) < for (int i = 0; i < separator.Length; ++i) if (index + i >= source.Length || source[index + i] != separator[i]) return false; return true; >
Your approach has a number of flaws — you’re reading an entire Byte[] into memory, but decryption is a streamable process, so you’re needlessly wasting memory. Secondly you cannot «split» an array (or a string, for that matter) in the CLR. When you split a CLR String it creates copies, which wastes memory.
public static void Main(String[] args) < using(FileStream fs = new FileStream( @"path\to\fileName.exe", FileMode.Read)) < BinaryReader rdr = new BinaryReader( fs ); SeekToEndOfDelimiter( rdr ); // Use an implementation of RC4 decryption that accepts Streams as arguments, then pass fs directly as an argument: using(FileStream output = new FileStream( @"path\to\out.exe", FileMode.Write)) < // Providing the key arguments is an exercise for the reader MyRc4Implementation.DecryptStream( fs, output, key ); >> > private static void SeekToEndOfDelimiter(BinaryReader rdr) < // Implementing this code is an exercise left up to the reader. // But just iterate through each byte (assuming ASCII-compatible encoding) until you encounter the end of the delimiter >
There, no messy byte[] arrays 🙂
I know I’m really, really late to the party, but. You can of course modify this to return a List easily if preferred. I left comments/writelines in case it would be helpful. This may not be the most optimal/optimized code but works well for my specific use case and I thought I would share.
public static byte[][] SplitBytesByDelimiter(byte[] data, byte delimiter) < if (data == null) throw new ArgumentNullException(nameof(data)); if (data.Length < 1) return null; ListretList = new List(); int start = 0; int pos = 0; byte[] remainder = null; // in case data found at end without terminating delimiter while (true) < // Console.WriteLine("pos " + pos + " start " + start); if (pos >= data.Length) break; if (data[pos] == delimiter) < // Console.WriteLine("delimiter found at pos " + pos + " start " + start); // separator found if (pos == start) < // Console.WriteLine("first char is delimiter, skipping"); // skip if first character is delimiter pos++; start++; if (pos >= data.Length) < // last character is a delimiter, yay! remainder = null; break; >else < // remainder exists remainder = new byte[data.Length - start]; Buffer.BlockCopy(data, start, remainder, 0, (data.Length - start)); continue; >> else < // Console.WriteLine("creating new byte[] at pos " + pos + " start " + start); byte[] ba = new byte[(pos - start)]; Buffer.BlockCopy(data, start, ba, 0, (pos - start)); retList.Add(ba); start = pos + 1; pos = start; if (pos >= data.Length) < // last character is a delimiter, yay! remainder = null; break; >else < // remainder exists remainder = new byte[data.Length - start]; Buffer.BlockCopy(data, start, remainder, 0, (data.Length - start)); >> > else < // payload character, continue; pos++; >> if (remainder != null) < // Console.WriteLine("adding remainder"); retList.Add(remainder); >return retList.ToArray(); >
Here’s mine. It only does the split once though. I made no attempt at making it performant.
public static byte[][] Split(this byte[] composite, byte[] seperator) < bool found = false; int i = 0; for (; i < composite.Length - seperator.Length; i++) < var compositeView = new byte[seperator.Length]; Array.Copy(composite, i, compositeView, 0, seperator.Length); found = compositeView.SequenceEqual(seperator); if (found) break; >if(found == false) < return null; >var component1Length = i; var component1 = new byte[component1Length]; var component2Length = composite.Length - seperator.Length - component1Length; var component2 = new byte[component2Length]; var component2Index = i + seperator.Length; Array.Copy(composite, 0, component1, 0, component1Length); Array.Copy(composite, component2Index, component2, 0, component2Length); return new byte[][] < component1, component2 >; >
byte[] b1 = new byte[] < 1, 2, 3, 4, 1, 1, 5 >; byte[] b2 = new byte[] < 1, 1 >; var parts1 = b1.Split(b2); // [1,2,3,4],[5] byte[] b3 = new byte[] < 1, 1, 3, 4, 4, 1, 5 >; byte[] b4 = new byte[] < 1, 1 >; var parts2 = b3.Split(b4); // [],[3,4,4,1,5] byte[] b5 = new byte[] < 0, 0, 3, 4, 4, 1, 1 >; byte[] b6 = new byte[] < 1, 1 >; var parts3 = b5.Split(b6); // [0,0,3,4,4],[] byte[] b7 = new byte[] < 1, 2, 3, 4, 5 >; byte[] b8 = new byte[] < 1, 2, 3, 4 >; var parts4 = b7.Split(b8); // [],[5] byte[] b9 = new byte[] < 1, 2, 3, 4, 5 >; byte[] b0 = new byte[] < 2, 3, 4, 5 >; var parts5 = b9.Split(b0); // [1],[] byte[] c1 = new byte[] < 1, 2, 3, 4, 5 >; byte[] c2 = new byte[] < 6 >; var parts6 = c1.Split(c2); // null
Java — Splitting a Byte array, I have a byte array: byte [] abc = new byte [512]; and i want to have 3 different byte arrays from this array. byte 0-127 byte 128-255 byte256-511. I tried abc.read (byte [], offset,length) but it works only if I give offset as 0, for any other value it throws an IndexOutOfbounds exception. What am I doing wrong? java …
Split an array of Bytes previously read from a file
I have a recursive function, which takes as an argument an array of Bytes (read from a file) and then tries to split it into two parts for recursion calls.
The file format is for example like this:
word1 word2 word3 word4 . word97 word98 word99 word100
So, to read it I use this code:
byte[] content = new byte[(int) file.length()]; FileInputStream fileInputStream = null; fileInputStream = new FileInputStream(file); fileInputStream.read(content);
But then I would like to Split the content array into two parts that each would represent the half of the original file. For example if the original file contains four lines, then the firstHalf array would contain the first two lines and the secondHalf array the last two lines of the original file.
int halfTheLengthOfContent = (int) Math.ceil(content.length / 2.0); firstHalf = Arrays.copyOfRange(content, 0, halfTheLengthOfContent) and secondHalf = Arrays.copyOfRange(content, halfTheLengthOfContent + 1, content.length)
But it doesn’t work since the resulting arrays don’t correspond to what I want. What I want is that the firstHalf array would contain the same content as if I’ve done this (with file1 this time containing the first half of the content of the original file and file2 the second half of it):
byte[] firstHalf = new byte[(int) file1.length()]; FileInputStream fileInputStream = null; fileInputStream = new FileInputStream(file1); fileInputStream.read(firstHalf);
byte[] secondHalf = new byte[(int) file2.length()]; FileInputStream fileInputStream = null; fileInputStream = new FileInputStream(file2); fileInputStream.read(secondHalf);
For example if the original file is this:
word1 word2 word3 word4 word5 word6 word7 word8
The concept of «bytes» and «lines» do not work well together.
Bytes are the core elements of a file, there may be a single byte per character. There may be two.
Splitting the byte[] that contains the entire file into two even portions is very unlikely to result in splitting into two byte[] containing the same number of lines. It is unlikely the split will even be on a linebreak, much more likely in the middle of a word.
What you need is to read the file as lines and manipulate them. For example:
final List lines = Files.readAllLines(Paths.get("path", "to", "file")); final int middle = lines.size() / 2; final List firstHalf = lines.subList(0, middle); final List secondHalf = lines.subList(middle, lines.size());
If the file has 4 lines then middle will be 2 . The first half will contain lines 0, 1 and the second half will contain lines 2, 3 . Remember that a List is zero-indexed and sublist excludes the upper bound.
With an odd number of lines the second half will contain the extra element.
Java — Converting String to Byte Array but keeping the, What I’m trying to do is for example, lets say we had a string that looked like this: String xyz = «1234567890»; How can I convert that String object into an Byte Array so the result byte array l