New file encoding java

How to force UTF-16 while reading/writing in Java?

I see that you can specify UTF-16 as the charset via Charset.forName(«UTF-16») , and that you can create a new UTF-16 decoder via Charset.forName(«UTF-16»).newDecoder() , but I only see the ability to specify a CharsetDecoder on InputStreamReader ‘s constructor. How so how do you specify to use UTF-16 while reading any stream in Java?

If the class allows it, you can do so at the boundary between a byte stream and character stream. (InputStreamReader is one such class, for other Reader that doesn’t give you the option to specify character set, just wrap it around InputStreamReader). Lower construct at the level of InputStream (byte stream) doesn’t have the concept of character set.

1 Answer 1

Input streams deal with raw bytes. When you read directly from an input stream, all you get is raw bytes where character sets are irrelevant.

The interpretation of raw bytes into characters, by definition, requires some sort of translation: how do I translate from raw bytes into a readable string? That «translation» comes in the form of a character set.

This «added» layer is implemented by Readers. Therefore, to read characters (rather than bytes) from a stream, you need to construct a Reader of some sort (depending on your needs) on top of the stream. For example:

InputStream is = . ; Reader reader = new InputStreamReader(is, Charset.forName("UTF-16")); 

This will cause reader.read() to read characters using the character set you specified. If you would like to read entire lines, use BufferedReader on top:

BufferedReader reader = new BufferedReader(new InputStreamReader(is, Charset.forName("UTF-16"))); String line = reader.readLine(); 

Источник

Читайте также:  Php what is null

set encoding as UTF-8 for a FileWriter [duplicate]

Below is my code, it is intended to take two .ckl files, compare the two, add the new items and create a new merged file. The program executes correctly when run in Netbeans however, when executing the .jar the program doesn’t appear to be encoding the file in UTF-8. I am rather new to programming and would like to know where or how I might need to be enforcing this encoding to take place? ** I have removed the Swing code and other lines so that only my method is shown, the method that does all of the comparing and merging.

public void mergeFiles(File[] files, File mergedFile) < ArrayListlist = new ArrayList(); FileWriter fstream = null; BufferedWriter out = null; try < fstream = new FileWriter(mergedFile, false); out = new BufferedWriter(fstream); >catch (IOException e1) < e1.printStackTrace(); >// Going in a different direction. We are using a couple booleans to tell us when we want to copy or not. So at the beginning since we start // with our source file we set copy to true, we want to copy everything and insert vuln names into our list as we go. After that first file // we set the boolean to false so that we dont start copying anything from the second file until it is a vuln. We set to true when we see vuln // and set it to false if we already have that in our list. // We have a tmpCopy to store away the value of copy when we see a vuln, and reset it to that value when we see an Boolean copy = true; Boolean tmpCopy = true; for (File f : files) < textArea1.append("merging files into: " + mergedFilePathway + "\n"); FileInputStream fis; try < fis = new FileInputStream(f); // BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(mergedFile), "UTF-8")); BufferedReader in = new BufferedReader(new InputStreamReader(fis)); String aLine; while ((aLine = in.readLine()) != null) < // Skip the close checklist and we can write it in at the end if (aLine.trim().equals("")) < continue; >if (aLine.trim().equals("")) < continue; >if (aLine.trim().equals("")) < continue; >if (aLine.trim().equals("")) < // Store our current value of copy tmpCopy = copy; copy = true; String aLine2 = in.readLine(); String aLine3 = in.readLine(); String nameLine = in.readLine(); if (list.contains(nameLine.trim())) < textArea1.append("Skipping: " + nameLine + "\n"); copy = false; while (!(aLine.trim().equals(""))) < aLine = in.readLine(); >continue; // this would skip the writing out to file part > else < list.add(nameLine.trim()); textArea1.append(". List is now . "); textArea1.append(list.toString() + "\n"); >if (copy) < out.write(aLine); out.newLine(); out.write(aLine2); out.newLine(); out.write(aLine3); out.newLine(); out.write(nameLine); out.newLine(); >> else if (copy) < out.write(aLine); out.newLine(); >// after we have written to file, if the line was a close vuln, switch copy back to original value if (aLine.trim().equals("")) < copy = tmpCopy; >> in.close(); > catch (IOException e) < e.printStackTrace(); >copy = false; > // Now lets add the close checklist tag we omitted before try < out.write(""); out.write(""); out.write(""); > catch (IOException e) < e.printStackTrace(); >try < out.close(); >catch (IOException e) < e.printStackTrace(); >> 

Источник

Write a file in UTF-8 using FileWriter (Java)?

I have the following code however, I want it to write as a UTF-8 file to handle foreign characters. Is there a way of doing this, is there some need to have a parameter? I would really appreciate your help with this. Thanks.

try < BufferedReader reader = new BufferedReader(new FileReader("C:/Users/Jess/My Documents/actresses.list")); writer = new BufferedWriter(new FileWriter("C:/Users/Jess/My Documents/actressesFormatted.csv")); while( (line = reader.readLine()) != null) < //If the line starts with a tab then we just want to add a movie //using the current actor's name. if(line.length() == 0) continue; else if(line.charAt(0) == '\t') < readMovieLine2(0, line, surname.toString(), forename.toString()); >//Else we've reached a new actor else < readActorName(line); >> > catch (IOException e)

9 Answers 9

Safe Encoding Constructors

Getting Java to properly notify you of encoding errors is tricky. You must use the most verbose and, alas, the least used of the four alternate contructors for each of InputStreamReader and OutputStreamWriter to receive a proper exception on an encoding glitch.

For file I/O, always make sure to always use as the second argument to both OutputStreamWriter and InputStreamReader the fancy encoder argument:

 Charset.forName("UTF-8").newEncoder() 

There are other even fancier possibilities, but none of the three simpler possibilities work for exception handing. These do:

 OutputStreamWriter char_output = new OutputStreamWriter( new FileOutputStream("some_output.utf8"), Charset.forName("UTF-8").newEncoder() ); InputStreamReader char_input = new InputStreamReader( new FileInputStream("some_input.utf8"), Charset.forName("UTF-8").newDecoder() ); 
 $ java -Dfile.encoding=utf8 SomeTrulyRemarkablyLongcLassNameGoeShere 

The problem is that that will not use the full encoder argument form for the character streams, and so you will again miss encoding problems.

Longer Example

Here’s a longer example, this one managing a process instead of a file, where we promote two different input bytes streams and one output byte stream all to UTF-8 character streams with full exception handling:

 // this runs a perl script with UTF-8 STD streams Process slave_process = Runtime.getRuntime().exec("perl -CS script args"); // fetch his stdin byte stream. OutputStream __bytes_into_his_stdin = slave_process.getOutputStream(); // and make a character stream with exceptions on encoding errors OutputStreamWriter chars_into_his_stdin = new OutputStreamWriter( __bytes_into_his_stdin, /* DO NOT OMIT! */ Charset.forName("UTF-8").newEncoder() ); // fetch his stdout byte stream. InputStream __bytes_from_his_stdout = slave_process.getInputStream(); // and make a character stream with exceptions on encoding errors InputStreamReader chars_from_his_stdout = new InputStreamReader( __bytes_from_his_stdout, /* DO NOT OMIT! */ Charset.forName("UTF-8").newDecoder() ); // fetch his stderr byte stream. InputStream __bytes_from_his_stderr = slave_process.getErrorStream(); // and make a character stream with exceptions on encoding errors InputStreamReader chars_from_his_stderr = new InputStreamReader( __bytes_from_his_stderr, /* DO NOT OMIT! */ Charset.forName("UTF-8").newDecoder() ); 

Now you have three character streams that all raise exception on encoding errors, respectively called chars_into_his_stdin , chars_from_his_stdout , and chars_from_his_stderr .

This is only slightly more complicated that what you need for your problem, whose solution I gave in the first half of this answer. The key point is this is the only way to detect encoding errors.

Just don’t get me started about PrintStream s eating exceptions.

Источник

How to write a UTF-8 file with Java?

I have some current code and the problem is its creating a 1252 codepage file, i want to force it to create a UTF-8 file Can anyone help me with this code, as i say it currently works. but i need to force the save on utf.. can i pass a parameter or something?? this is what i have, any help really appreciated

var out = new java.io.FileWriter( new java.io.File( path )), text = new java.lang.String( src || "" ); out.write( text, 0, text.length() ); out.flush(); out.close(); 

10 Answers 10

Instead of using FileWriter , create a FileOutputStream . You can then wrap this in an OutputStreamWriter , which allows you to pass an encoding in the constructor. Then you can write your data to that inside a try-with-resources Statement:

try (OutputStreamWriter writer = new OutputStreamWriter(new FileOutputStream(PROPERTIES_FILE), StandardCharsets.UTF_8)) // do stuff > 

@Jon Skeet: Given that FileWriter is a wrapper for FileOutputStream that assumes the default encoding and buffer size, wouldn’t that defeat the point?

I recommed to separate every declaration for types that implements the Closeable interface, especially if you use try with resources, like «new FileOutputStream»; is a good practice and avoid future errors like «IOException: Too many open files».

Writer out = new BufferedWriter(new OutputStreamWriter( new FileOutputStream("outfilename"), "UTF-8")); try < out.write(aString); >finally

Try using FileUtils.write from Apache Commons.

You should be able to do something like:

File f = new File("output.txt"); FileUtils.writeStringToFile(f, document.outerHtml(), "UTF-8"); 

This will create the file if it does not exist.

@Smarty only if you are already using Apache Commons. Otherwise it seems an awful waste to include yet another jar just because you don’t want to write a few more characters.

If you read the Java docs on the link shown in the question, then it tells you the version of the Commons IO API where the write APIs were introduced. It looks like the write APIs were introduced from v2.0 onwards.

Just would like to mention that I used the method FileUtils.writeStringToFile(. ) (with commons-io-1.3.1.jar) instead of FileUtils.write(. ).

Since Java 7 you can do the same with Files.newBufferedWriter a little more succinctly:

Path logFile = Paths.get("/tmp/example.txt"); try (BufferedWriter writer = Files.newBufferedWriter(logFile, StandardCharsets.UTF_8)) < writer.write("Hello World!"); // . >

All of the answers given here wont work since java’s UTF-8 writing is bugged.

As far as I can tell, the bug is this one (since the author of that article doesn’t bother to mention it): bugs.sun.com/view_bug.do?bug_id=4508058

The only issue when writing is the missing BOM. No big deal. Reading a file with a BOM on the other hand requires stripping it manually.

UTF-8 doesn’t need BOM, so technically the written file is still a valid UTF-8 encoded text file. The bug is with reading an UTF-8 with BOM.

var out = new java.io.PrintWriter(new java.io.File(path), "UTF-8"); text = new java.lang.String( src || "" ); out.print(text); out.flush(); out.close(); 

The Java 7 Files utility type is useful for working with files:

import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import java.io.IOException; import java.util.*; public class WriteReadUtf8 < public static void main(String[] args) throws IOException < Listlines = Arrays.asList("These", "are", "lines"); Path textFile = Paths.get("foo.txt"); Files.write(textFile, lines, StandardCharsets.UTF_8); List read = Files.readAllLines(textFile, StandardCharsets.UTF_8); System.out.println(lines.equals(read)); > > 

The Java 8 version allows you to omit the Charset argument — the methods default to UTF-8.

Источник

Оцените статью