Compare two java files

Comparing Files in Java

Join the DZone community and get the full member experience.

I am creating a series of video tutorials for PACKT about network programming in Java. There is a whole section about Java NIO. One sample program is to copy a file via raw socket connection from a client to a server. The client reads the file from the disk, and the server saves the bytes as they arrive, to the disk. Because this is a demo, the server and the client are running on the same machine and the file is copied from one directory to the exact same directory, but under a different name. The proof in the pudding is eating it: The files have to be compared.

The file I wanted to copy was created to contain random bytes. Transferring only text information can sometimes leave tricky bugs lurking in the code. The random file was created using the simple Java class:

package packt.java9.network.niodemo; import java.io.FileOutputStream; import java.io.IOException; import java.util.Random; public class SampleMaker < public static void main(String[] args) throws IOException < byte[] buffer = new byte[1024 * 1024 * 10]; try (FileOutputStream fos = new FileOutputStream("sample.txt")) < Random random = new Random(); for (int i = 0; i < 16; i++) < random.nextBytes(buffer); fos.write(buffer); >> > >

Using IntelliJ, comparing files is fairly easy, but since the files are binary and large, this approach is not really optimal. I decided to write a short program that will not only signal that the files are different, but also where the difference is. The code is extremely simple:

package packt.java9.network.niodemo; import java.io.BufferedInputStream; import java.io.FileInputStream; import java.io.IOException; public class SampleCompare < public static void main(String[] args) throws IOException < long start = System.nanoTime(); BufferedInputStream fis1 = new BufferedInputStream(new FileInputStream("sample.txt")); BufferedInputStream fis2 = new BufferedInputStream(new FileInputStream("sample-copy.txt")); int b1 = 0, b2 = 0, pos = 1; while (b1 != -1 && b2 != -1) < if (b1 != b2) < System.out.println("Files differ at position " + pos); >pos++; b1 = fis1.read(); b2 = fis2.read(); > if (b1 != b2) < System.out.println("Files have different length"); >else < System.out.println("Files are identical, you can delete one of them."); >fis1.close(); fis2.close(); long end = System.nanoTime(); System.out.print("Execution time: " + (end - start)/1000000 + "ms"); > >

The running time comparing the two 160MB files is around 6 seconds on my SSD-equipped Mac Book — and it does not improve significantly if I specify a large, say 10MB, buffer as the second argument to the constructor of BufferedInputStream . (On the other hand, if we do not use the BufferedInputStream , then the time is approximately ten times more.) This is acceptable, but if I simply issue a diff sample.txt sample-copy.txt from the command line, then the response is significantly faster, and not 6 seconds.

Читайте также:  Python datetime object to timestamp

It can be many factors, like Java’s startup time or code interpretation at the start of the while loop, until the JIT compiler thinks it is time to start to work. My hunch is, however, that the code spends most of the time reading the file into the memory. Reading the bytes to the buffer is a complex process. It involves the operating system, the device drivers, the JVM implementation, moving bytes from one place to the other, and finally, we only compare the bytes, nothing else. It can be done in a simpler way. We can ask the operating system to do it for us and skip most of the Java runtime activities, file buffers, and other glitters.

We can ask the operating system to read the file to memory and then just fetch the bytes one by one from where they are. We do not need a buffer, which belongs to a Java object and consumes heap space. We can use memory-mapped files. After all, memory-mapped files use Java NIO, and that is exactly the topic of the part of the tutorial videos that are currently in the making.

Memory-mapped files are read into the memory by the operating system and the bytes are available to the Java program. The memory is allocated by the operating system and it does not consume the heap memory. If the Java code modifies the content of the mapped memory, then the operating system writes the change to the disk in an optimized way when it thinks it is due.

Читайте также:  Меняем фон сайта с помощью HTML - Нубекс

This, however, does not mean that the data is lost if the JVM crashes. When the Java code modifies the memory-mapped file memory, then it modifies memory that belongs to the operating system and is available (and is valid) after the JVM stops.

There is no guarantee and 100% protection against power outages and hardware crashes, but that is a very low risk. If anyone is afraid of those, then the protection should be on the hardware level, which Java has nothing to do with anyway. With memory-mapped files, we can be sure that the data is saved into the disk with certainty — or at least a very high probability that can only be increased by failure tolerant hardware, clusters, uninterruptible power supplies, and so on. These are not Java’s responsibility. If you really have to do something from Java to have the data written to disk, then you can call the MappedByteBuffer.force() method that asks the operating system to write the changes to the disk. Calling this too often and unnecessarily may hinder performance, though (simply because it writes the data to disk and returns only when the operating system says that the data was written.)

Reading and writing data using memory-mapped files is usually much faster in the case of large files. To have the appropriate performance, the machine should have significant memory, otherwise, only part of the file is kept in memory, then the page faults increase. One of the good things is that if the same file is mapped into memory by two or more different processes, then the same memory area is used. That way, processes can even communicate with each other.

The comparison application using memory-mapped files is the following:

package packt.java9.network.niodemo; import java.io.IOException; import java.io.RandomAccessFile; import java.nio.ByteBuffer; import java.nio.channels.FileChannel; public class MapCompare < public static void main(String[] args) throws IOException < long start = System.nanoTime(); FileChannel ch1 = new RandomAccessFile("sample.txt", "r").getChannel(); FileChannel ch2 = new RandomAccessFile("sample-copy.txt", "r").getChannel(); if (ch1.size() != ch2.size()) < System.out.println("Files have different length"); return; >long size = ch1.size(); ByteBuffer m1 = ch1.map(FileChannel.MapMode.READ_ONLY, 0L, size); ByteBuffer m2 = ch2.map(FileChannel.MapMode.READ_ONLY, 0L, size); for (int pos = 0; pos < size; pos++) < if (m1.get(pos) != m2.get(pos)) < System.out.println("Files differ at position " + pos); return; >> System.out.println("Files are identical, you can delete one of them."); long end = System.nanoTime(); System.out.print("Execution time: " + (end - start) / 1000000 + "ms"); > >

To memory map the files, we have to open them first using the RandomAccessFile class and ask for the channel from that object. The channel can be used to create a MappedByteBuffer , which is the representation of the memory area where the file content is loaded. The method map in the example maps the file in read-only mode — from the start of the file to the end of the file. We try to map the whole file. This works only if the file is not larger than 2GB. The start position is long but the size of the area to be mapped is limited by the size of an Integer .

Generally this it… Oh yes, the running time comparing the 160MB random content files is around 1 sec.

Published at DZone with permission of Peter Verhas , DZone MVB . See the original article here.

Opinions expressed by DZone contributors are their own.

Источник

Comparing Files In Java

author-image

By Peter Hill, 15 November, 2018 43K

comparing-files-in-java

Java is one of the best programming languages that already took popularity over the C++ among the programmers. I am not saying this as a passionate Java developer. In the last 20 years, Java has proved its efficiency over other programming languages. Nowadays, Android apps development is generating more values to learn Java.

When you are learning Java, network programming can be a great source of practicing Java. You can use a raw socket connection to copy a specific file from clients to servers. In that case, you must have a complete understanding of the Java NIO. I have seen many Java programmers face difficulties to compare files in Java.

You may want to get the required file after checking whether the file has any duplicate version in either client or server. It can happen when you have your client and server on the same machine. In that case, you need to compare the files to copy them with a different name. Comparing files in java also can help you to differentiate between the local and the remote files. You can quickly identify the duplicate lines, which allows you to remove the file entirely.

In this tutorial, I am going to show you how you can compare files in Java. You can use any IDE to write codes for Java. Here, I am using a highly simplified approach to compare two files in Java. With the use of this program, you will not only get the message whether the files are copied or not, but also identify where the difference is.

Steps

Step 1: At first, you need to define two BufferedReader objects, reader1 and reader2. These objects read your chosen files.

BufferedReader reader1 = new BufferedReader(new FileReader(“Pass the path of file1 here”)) BufferedReader reader2 = new BufferedReader(new FileReader(“Pass the path of file2 here”))

Step 2: Now you need to initialize a true value to the boolean variable areEqual and 1 to the integer variable linenum. areEqual is used for flagging when a difference is found in the two contents, whereas linenum dictates the no. of lines.

boolean areEqual = true; int lineNum = 1;

Step 3: Read the lines of both file1 and file2 respectively into line1 and line2 till the last buffer.

String line1 = reader1.readLine() String line2 = reader2.readLine()

Step 4: In case of either line1 or line2 being null, assign false to areEqual and break the loop. Given, both of them aren’t null, compare them using the equalsIgnoreCase() method. Continue the loop if equalsIgnoreCase() returns true, else break from the loop and assign false to areEqual.

while (line1 != null || line2 != null) < if(line1 == null || line2 == null) < areEqual = false; break; >else if(! line1.equalsIgnoreCase(line2)) < areEqual = false; break; >line1 = reader1.readLine(); line2 = reader2.readLine(); lineNum++; >

Step 5: You can declare both the files have the same contents if the areEqual boolean is true, otherwise they don’t contain the same contents.

Final Java Code:

import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; public class CompareTextFiles < public static void main(String[] args) throws IOException < BufferedReader reader1 = new BufferedReader(new FileReader("C:\\file1.txt")); BufferedReader reader2 = new BufferedReader(new FileReader("C:\\file2.txt")); String line1 = reader1.readLine(); String line2 = reader2.readLine(); boolean areEqual = true; int lineNum = 1; while (line1 != null || line2 != null) < if(line1 == null || line2 == null) < areEqual = false; break; >else if(! line1.equalsIgnoreCase(line2)) < areEqual = false; break; >line1 = reader1.readLine(); line2 = reader2.readLine(); lineNum++; > if(areEqual) < System.out.println("Two files have same content."); >else < System.out.println("Two files have different content. They differ at line "+lineNum); System.out.println("File1 has "+line1+" and File2 has "+line2+" at line "+lineNum); >reader1.close(); reader2.close(); > >

This program is simple in nature and may take up a lot of memory and time to execute. If you’re looking for a very fast and efficient solution, you need to use a highly advanced technique called memory-mapping. Files that are mapped by the memory are read directly by your operating system. The OS allocates the memory itself and hence the program doesn’t consume heap memory.

Open your files by using the RandomAccessFile class and ask for the channel from this object if you want to memory map the two files. MappedByteBuffer, a representation of the memory area of your file’s contents can be created from the channel that you get access from the RandomAccessFile class.

The below program outlines what you need to do.

package packt.java9.network.niodemo; import java.io.IOException; import java.io.RandomAccessFile; import java.nio.ByteBuffer; import java.nio.channels.FileChannel; public class MapCompare < public static void main(String[] args) throws IOException < long start = System.nanoTime(); FileChannel ch1 = new RandomAccessFile("sample.txt", "r").getChannel(); FileChannel ch2 = new RandomAccessFile("sample-copy.txt", "r").getChannel(); if (ch1.size() != ch2.size()) < System.out.println("Files have different length"); Return; >long size = ch1.size(); ByteBuffer m1 = ch1.map(FileChannel.MapMode.READ_ONLY, 0L, size); ByteBuffer m2 = ch2.map(FileChannel.MapMode.READ_ONLY, 0L, size); for (int pos = 0; pos < size; pos++) < if (m1.get(pos) != m2.get(pos)) < System.out.println("Files differ at position " + pos); return; >> System.out.println("Files are identical, you can delete one of them."); long end = System.nanoTime(); System.out.print("Execution time: " + (end - start) / 1000000 + "ms"); > >

You can also utilize the java-diff-utils API to do this job interactively without writing any of the code yourself. Follow this post carefully to do the this efficiently all by yourself.

JAVA programs are often used simultaneously by both the server and the client. This opens up the possibility of duplicative code that reuses already utilized memory and processing power. So, the elimination of such programs is elemental to ensure efficient and productive workstations.

You can use the programs mentioned above to find out such duplicative files pretty easily. The first program is more than enough for removing small files. However, for files that are very large.

If You Appreciate This, You Can Consider:

About The Author

author-image

Peter Hill is the best editor of BestEssayTips. He is a socially active person, likes traveling and photo/video editing. He finds himself in writing.

Источник

Оцените статью