Java check file is zip

Java Utililty Methods Zip File Check

Method

if (file.isDirectory()) < return false; byte[] bytes = new byte[4]; try (FileInputStream fIn = new FileInputStream(file)) < if (fIn.read(bytes) != 4) < return false; final int header = bytes[0] + (bytes[1] return 0x04034b50 == header;
if (fileName == null || fileName.trim().length() == 0) < return false; ZipFile zipFile = null; try < zipFile = new ZipFile(fileName.trim()); > catch (IOException ioException) < return false; .
if (fileName == null) < return false; File file = new File(fileName); if (!file.exists() || !file.isFile()) < return false; InputStream in = null; .
if (path == null) return false; try < new ZipFile(path).close(); ; return true; > catch (IOException e) < return false; .
boolean isArchive = true; ZipFile zipFile = null; try < zipFile = new ZipFile(file); > catch (ZipException zipCurrupted) < isArchive = false; >catch (IOException anyIOError) < isArchive = false; .
try < RandomAccessFile raf = new RandomAccessFile(f, "r"); long n = raf.readInt(); raf.close(); if (n == 0x504B0304) < return true; > else < return false; .
try < return new ZipInputStream(new FileInputStream(file)).getNextEntry() != null; > catch (IOException e) < return false;
final String name = file.getName().toLowerCase(); for (String ext : extList) < if (name.endsWith(ext)) return true; return false;
DataInputStream dis = new DataInputStream(inputStream); try < return dis.readInt() == 0x504b0304; > catch (EOFException e) < return false; > finally < dis.close();
long n = 0x0; File file = null; RandomAccessFile raf = null; try < try < file = new File(name); if (!file.exists()) < return false; .

Источник

Читайте также:  Php регулярные выражения два выражения

Best way to detect if a stream is zipped in Java

General unzipping thoughts In general, it appeared that it's much more convenient to work with files while [un]zipping, than with streams. Handling of zip files is discussed here: What is a good Java library to zip/unzip files?

Best way to detect if a stream is zipped in Java

What is the best way to find out i java.io.InputStream contains zipped data?

Introduction

Since all the answers are 5 years old I feel a duty to write down, what's going on today. I seriously doubt one should read magic bytes of the stream! That's a low level code, it should be avoided in general.

Simple answer

If the Stream can be read via ZipInputStream, it should be zipped.

Yes, but in case of ZipInputStream "can be read" means that first call to .getNextEntry() returns a non-null value. No exception catching et cetera. So instead of magic bytes parsing you can just do:

boolean isZipped = new ZipInputStream(yourInputStream).getNextEntry() != null; 

General unzipping thoughts

In general, it appeared that it's much more convenient to work with files while [un]zipping, than with streams. There are several useful libraries, plus ZipFile has got more functionality than ZipInputStream. Handling of zip files is discussed here: What is a good Java library to zip/unzip files? So if you can work with files you better do!

Code sample

I needed in my application to work with streams only. So that's the method I wrote for unzipping:

import org.apache.commons.io.IOUtils; import java.util.zip.ZipEntry; import java.util.zip.ZipInputStream; public boolean unzip(InputStream inputStream, File outputFolder) throws IOException < ZipInputStream zis = new ZipInputStream(inputStream); ZipEntry entry; boolean isEmpty = true; while ((entry = zis.getNextEntry()) != null) < isEmpty = false; File newFile = new File(outputFolder, entry.getName()); if (newFile.getParentFile().mkdirs() && !entry.isDirectory()) < FileOutputStream fos = new FileOutputStream(newFile); IOUtils.copy(zis, fos); IOUtils.closeQuietly(fos); >> IOUtils.closeQuietly(zis); return !isEmpty; > 

The magic bytes for the ZIP format are 50 4B . You could test the stream (using mark and reset - you may need to buffer) but I wouldn't expect this to be a 100% reliable approach. There would be no way to distinguish it from a US-ASCII encoded text file that began with the letters PK .

The best way would be to provide metadata on the content format prior to opening the stream and then treat it appropriately.

You could check that the first four bytes of the stream are the local file header signature that starts the local file header that proceeds every file in a ZIP file, as shown in the spec here to be 50 4B 03 04 .

A little test code shows this to work:

byte[] buffer = new byte[4]; try < ZipOutputStream zos = new ZipOutputStream(new FileOutputStream("so.zip")); ZipEntry ze = new ZipEntry("HelloWorld.txt"); zos.putNextEntry(ze); zos.write("Hello world".getBytes()); zos.close(); FileInputStream is = new FileInputStream("so.zip"); is.read(buffer); is.close(); >catch(IOException e) < e.printStackTrace(); >for (byte b : buffer)

Not very elegant, but reliable:

If the Stream can be read via ZipInputStream , it should be zipped.

Read directly a file within a Zip file, If you can be sure that your zip file will never be packed inside another jar, you can use something like: URL zipUrl = Main.class.

Java: Unzip a Zip File

Using Java we will see how to unzip a zip file.The source code can be found at the
Duration: 5:09

How to detect file type from its content in zip archive?

I have a zip archive that contains several gzip files. But gzip file's extentions are also .zip . I walk through zip archive with ZipInputStream. How can I detect inner file's type with reading its content rather than extentions. I also need not to change (or reset) ZipInputStream position.

  • Read files in zip with using inputStream (ZipInputStream in my case) Because zip in zip is possible.
  • Find file type from its content.
  • While finding file type from its content, inputStream position should not change. Because i will continue to read next files.

root/1.zip/2.zip/3.zip(actually 3 is gzip)/4.txt

public static void main(String[] args) < //root/1.zip/2.zip/3.zip(actually 3 is gzip)/4.txt String file = "root/1.zip"; File rootZip = new File(file); try (FileInputStream fis = new FileInputStream(rootZip)) < lookupInZip(fis) .stream() .forEach(System.out::println); >catch (IOException e) < System.out.println("Failed to get files"); >> public static List lookupInZip(InputStream inputStream) throws IOException < Tika tika = new Tika(); Listpaths = new ArrayList<>(); ZipInputStream zipInputStream = new ZipInputStream(inputStream); ZipEntry entry = zipInputStream.getNextEntry(); while (entry != null) < String entryName = entry.getName(); if (!entry.isDirectory()) < //Option 1 //String fileType = tika.detect(entryName); //Option 2 String fileType = tika.detect(zipInputStream); if ("application/zip".equals(fileType)) < ListinnerPaths = lookupInZip(zipInputStream); paths.addAll(innerPaths); > else < paths.add(entryName); >> entry = zipInputStream.getNextEntry(); > return paths; > 

If I use option 1, '3.zip' is evaluated as zip file but it is gzip. If I use option 2, '2.zip' is evaluated as zip correctly by using its content. But when lookupInZip() is called for '3.zip' recursively, zipInputStream.getNextEntry() returns null. Because in previous step, we use inputStream content to detect type and inputStrem position changed.

Note: tika.detect() uses BufferedInputStream in implementation to reset inputStream position but it does not solve my problem.

The first two bytes are enough to see if it is likely a zip file, likely a gzip file, or certainly something else.

If the first two bytes are 0x50 0x4b , then it is likely a zip file. If the first two bytes are 0x1f 0x8b , then it is likely a gzip file. If it is neither, then the file is something else.

The first two bytes matching is not a guarantee it is that type, but it appears from your structure that it is usually one or the other, and you can use the extension as further corroborating evidence that it is compressed.

As for not changing the position, you need a way to peek at the first two bytes without advancing the position, or a way to get them and then unget them to return the position to where it was.

Detect a compressed file java, Is the purpose that you only want to have text files? Can you add binary files which are not compressed? A generic way to detect a compressed

How to check if InputStream is Gzipped?

Is there any way to check if InputStream has been gzipped? Here's the code:

public static InputStream decompressStream(InputStream input) < try < GZIPInputStream gs = new GZIPInputStream(input); return gs; >catch (IOException e) < logger.info("Input stream not in the GZIP format, using standard format"); return input; >> 

I tried this way but it doesn't work as expected - values read from the stream are invalid. EDIT: Added the method I use to compress data:

public static byte[] compress(byte[] content) < ByteArrayOutputStream baos = new ByteArrayOutputStream(); try < GZIPOutputStream gs = new GZIPOutputStream(baos); gs.write(content); gs.close(); >catch (IOException e) < logger.error("Fatal error occured while compressing data"); throw new RuntimeException(e); >double ratio = (1.0f * content.length / baos.size()); if (ratio > 1) < logger.info("Compression ratio equals " + ratio); return baos.toByteArray(); >logger.info("Compression not needed"); return content; > 

It's not foolproof but it's probably the easiest and doesn't rely on any external data. Like all decent formats, GZip too begins with a magic number which can be quickly checked without reading the entire stream.

public static InputStream decompressStream(InputStream input) < PushbackInputStream pb = new PushbackInputStream( input, 2 ); //we need a pushbackstream to look ahead byte [] signature = new byte[2]; int len = pb.read( signature ); //read the signature pb.unread( signature, 0, len ); //push back the signature to the stream if( signature[ 0 ] == (byte) 0x1f && signature[ 1 ] == (byte) 0x8b ) //check if matches standard gzip magic number return new GZIPInputStream( pb ); else return pb; >

(Source for the magic number: GZip file format specification)

Update: I've just dicovered that there is also a constant called GZIP_MAGIC in GZipInputStream which contains this value, so if you really want to, you can use the lower two bytes of it.

The InputStream comes from HttpURLConnection#getInputStream()

In that case you need to check if HTTP Content-Encoding response header equals to gzip .

URLConnection connection = url.openConnection(); InputStream input = connection.getInputStream(); if ("gzip".equals(connection.getContentEncoding())) < input = new GZIPInputStream(input); >// . 

This all is clearly specified in HTTP spec.

Update : as per the way how you compressed the source of the stream: this ratio check is pretty. insane. Get rid of it. The same length does not necessarily mean that the bytes are the same. Let it always return the gzipped stream so that you can always expect a gzipped stream and just apply GZIPInputStream without nasty checks.

I found this useful example that provides a clean implementation of isCompressed() :

/* * Determines if a byte array is compressed. The java.util.zip GZip * implementation does not expose the GZip header so it is difficult to determine * if a string is compressed. * * @param bytes an array of bytes * @return true if the array is compressed or false otherwise * @throws java.io.IOException if the byte array couldn't be read */ public boolean isCompressed(byte[] bytes) < if ((bytes == null) || (bytes.length < 2)) < return false; >else < return ((bytes[0] == (byte) (GZIPInputStream.GZIP_MAGIC)) && (bytes[1] == (byte) (GZIPInputStream.GZIP_MAGIC >> 8))); > > 
@Test public void testIsCompressed()

I believe this is simpliest way to check whether a byte array is GZIP formatted or not, it does not depend on any HTTP entity or mime type support

public static boolean isGzipStream(byte[] bytes)

Zipping and Unzipping in Java, Learn how to zip and unzip a file, multiple files or a full directory using core Java.

Источник

My Tech Notes

The first four bytes of a valid ZIP file should be <'P', 'K', 0x3, 0x4>, a.k.a. the magic bytes for ZIP. The best way to detect if a file is ZIP archive is to check the magic bytes.

The following java code includes two methods to detect if a file or a inputStream is ZIP archive:

package example; import java.io.BufferedInputStream; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; import java.io.InputStream; import java.io.RandomAccessFile; public class ZipUtil < public static byte[] MAGIC = < 'P', 'K', 0x3, 0x4 >; /** * The method to test if a input stream is a zip archive. * * @param in * the input stream to test. * @return */ public static boolean isZipStream(InputStream in) < if (!in.markSupported()) < in = new BufferedInputStream(in); >boolean isZip = true; try < in.mark(MAGIC.length); for (int i = 0; i < MAGIC.length; i++) < if (MAGIC[i] != (byte) in.read()) < isZip = false; break; >> in.reset(); > catch (IOException e) < isZip = false; >return isZip; > /** * Test if a file is a zip file. * * @param f * the file to test. * @return */ public static boolean isZipFile(File f) < boolean isZip = true; byte[] buffer = new byte[MAGIC.length]; try < RandomAccessFile raf = new RandomAccessFile(f, "r"); raf.readFully(buffer); for (int i = 0; i < MAGIC.length; i++) < if (buffer[i] != MAGIC[i]) < isZip = false; break; >> raf.close(); > catch (Throwable e) < isZip = false; >return isZip; > public static void main(String[] args) throws FileNotFoundException < // test if a input stream is a zip stream. System.out.println(isZipStream(new FileInputStream(new File("/tmp/1.zip")))); // test if a file is zip file. System.out.println(isZipFile(new File("/tmp/1.zip"))); >>

See also:

Источник

Is there a way to check if a file is a zip file in java?

send pies

posted 16 years ago

  • Report post to moderator
  • Is there a way to check if a file is a zip file in java?
    Checking file extension is one way, but is there a better way?

    jQuery

    send pies

    posted 16 years ago

  • Report post to moderator
  • Use file filters. There is File utility in Apache common.io package.

    IOFileFilter suffixFilter = FileFilterUtils.suffixFileFilter("zip");

    Wanderer

    send pies

    posted 16 years ago

  • Report post to moderator
  • Aside from looking at the file extension, a zip file (or jar file) should begin with a standard "magic number" in the first four bytes. You can check for it thus:

    After that, to see if the whole thing is really a valid zip file, the easiest way is probably to open it up using the ZipFile class, read all the entries, and see if it throws an exception.

    "I'm not back." - Bill Harding, Twister

    Источник

    Оцените статью