Read one char in java

Содержание

Take a char input from the Scanner
24 Answers 24
Is it possible to read a single character from an input stream?
How do I read input character-by-character in Java?
9 Answers 9
Reading a single UTF-8 character with RandomAccessFile
3 Answers 3

Take a char input from the Scanner

This method doesn’t exist. I tried taking c as a String . Yet, it would not always work in every case, since the other method I am calling from my method requires a char as an input. Therefore I have to find a way to explicitly take a char as an input. Any help?

24 Answers 24

You could take the first character from Scanner.next :

char c = reader.next().charAt(0);

To consume exactly one character you could use:

char c = reader.findInLine(".").charAt(0);

To consume strictly one character you could use:

char c = reader.next(".").charAt(0);

Well, the next() version crashes on the gibberish input bhjgfbergq35987t%$#%$# and the findInLine() version doesn’t. My question was really more about the words «strictly» and «exactly» and not the corresponding code fragments. I’m unaware of a context in which those words are not synonyms.

After this reader.next() will return a single-character string.

There is no API method to get a character from the Scanner. You should get the String using scanner.next() and invoke String.charAt(0) method on the returned String.

Scanner reader = new Scanner(System.in); char c = reader.next().charAt(0);

Just to be safe with whitespaces you could also first call trim() on the string to remove any whitespaces.

Scanner reader = new Scanner(System.in); char c = reader.next().trim().charAt(0);

There are three ways to approach this problem:

Call next() on the Scanner, and extract the first character of the String (e.g. charAt(0) ) If you want to read the rest of the line as characters, iterate over the remaining characters in the String. Other answers have this code.
Use setDelimiter(«») to set the delimiter to an empty string. This will cause next() to tokenize into strings that are exactly one character long. So then you can repeatedly call next().charAt(0) to iterate the characters. You can then set the delimiter to its original value and resume scanning in the normal way!
Use the Reader API instead of the Scanner API. The Reader.read() method delivers a single character read from the input stream. For example:

Reader reader = new InputStreamReader(System.in); int ch = reader.read(); if (ch != -1) < // check for EOF // we have a character . >

When you read from the console via System.in , the input is typically buffered by the operating system, and only «released» to the application when the user types ENTER. So if you intend your application to respond to individual keyboard strokes, this is not going to work. You would need to do some OS-specific native code stuff to turn off or work around line-buffering for console at the OS level.

Источник

Is it possible to read a single character from an input stream?

I started with an InputStreamReader, but this buffered its input, reading more than was required from the input stream (as mentioned in its Java docs). Delving into the source code (java version «1.7.0_147-icedtea») I got to the sun.nio.cs.StreamDecoder class, which contained the comment:

// In order to handle surrogates properly we must never try to produce // fewer than two characters at a time. If we're only asked to return one // character then the other is saved here to be returned later.

So I guess the question becomes «is this true, and if so why?» From my (very basic!) understanding of the 6 charsets required by the JLS, it is always possible to determine the exact number of bytes required to read a single character, so no read-ahead would be necessary. Background is I had a binary file containing a bunch of data with different encodings (numbers, strings, single byte tokens etc.). The basic format was a repeating set of byte marker (indicating the type of data) followed by optional data if required for that type. The two types containing character data were null-terminated strings and strings with a preceding 2-byte length. So for null terminated strings I thought something like this would do the trick:

String readStringWithNull(InputStream in) throws IOException < StringWriter sw = new StringWriter(); InputStreamReader isr = new InputStreamReader(in, "UTF-16LE"); for (int i; (i = isr.read()) >0; ) < sw.write(i); >return sw.toString(); >

But the InputStreamReader read ahead from the buffer, so subsequent read operations on the base InputStream missed data. For my particular case I knew that all characters would be UTF-16LE BMP (sort of UCS-2LE) so I just coded around that, but I’m still interested in the general case above. Also, I’ve seen InputStreamReader buffering issue which is similar, but does not appear to answer this specific question. Cheers,

Источник

How do I read input character-by-character in Java?

I am used to the c-style getchar() , but it seems like there is nothing comparable for java. I am building a lexical analyzer, and I need to read in the input character by character. I know I can use the scanner to scan in a token or line and parse through the token char-by-char, but that seems unwieldy for strings spanning multiple lines. Is there a way to just get the next character from the input buffer in Java, or should I just plug away with the Scanner class? The input is a file, not the keyboard.

9 Answers 9

Use Reader.read(). A return value of -1 means end of stream; else, cast to char.

This code reads character data from a list of file arguments:

public class CharacterHandler < //Java 7 source level public static void main(String[] args) throws IOException < // replace this with a known encoding if possible Charset encoding = Charset.defaultCharset(); for (String filename : args) < File file = new File(filename); handleFile(file, encoding); >> private static void handleFile(File file, Charset encoding) throws IOException < try (InputStream in = new FileInputStream(file); Reader reader = new InputStreamReader(in, encoding); // buffer for efficiency Reader buffer = new BufferedReader(reader)) < handleCharacters(buffer); >> private static void handleCharacters(Reader reader) throws IOException < int r; while ((r = reader.read()) != -1) < char ch = (char) r; System.out.println("Do something with " + ch); >> >

The bad thing about the above code is that it uses the system’s default character set. Wherever possible, prefer a known encoding (ideally, a Unicode encoding if you have a choice). See the Charset class for more. (If you feel masochistic, you can read this guide to character encoding.)

(One thing you might want to look out for are supplementary Unicode characters — those that require two char values to store. See the Character class for more details; this is an edge case that probably won’t apply to homework.)

Источник

Reading a single UTF-8 character with RandomAccessFile

I’ve set up a sequential scanner, where a RandomAccessFile pointing to my file is able to read a single character, via the below method:

public char nextChar() < try < seekPointer++; int i = source.read(); return i >-1 ? (char) i : '\0'; // INFO: EOF character is -1. > catch (IOException e) < e.printStackTrace(); >return '\0'; >

The seekPointer is just a reference for my program, but the method stores source.read() in an int , and then returns it casted to a char if its not the end of the file. But these chars that I’m receiving are in ASCII format, infact its so bad that I can’t even use a symbol such as ç. Is there a way that I can receive a single character, that is in UTF-8 format or atleast something standardised that allows more than just the ASCII character set? I know I can use readUTF() but that returns an entire line as a String, which is not what I am after. Also, I can’t simply use another stream reader, because my program requires a seek(int) function, allowing me to move back and forth in the file.

As @WillisBlackburn points out in his detailed answer below, you cannot select a random byte offset in a UTF-8 file and be guaranteed to get a «character». You might have to back up to find the start of a multi-byte sequence. Is this what you had in mind?

@JimGarrison Well I’m trying to make an algorithm out of his answer but its not doing very great. So no, not what I had in mind, something more along Adam’s answer. I’m just seeing what works at the moment.

That’s because you need to use the String(byte[] bytes, Charset c) constructor and specify UTF-8. Otherwise it will assume your platform default character set.

3 Answers 3

Building from Willis Blackburn’s answer, I can simply do some integer checks to make sure that they exceed a certain number, to get the amount of characters I need to check ahead.

Judging by the following table:

first byte starts with 0 1 byte char first byte starts with 10 >= 128 && = 192 2 bytes char first byte starts with 111 >= 224 3 bytes char first byte starts with 1111 >= 240 4 bytes char

We can check the integer read from RandomAccessFile.read() by comparing it against the numbers in the middle column, which are literally just the integer representations of a byte. This allows us to skip byte conversion completely, saving time.

The following code, will read a character from a RandomAccessFile, with a byte-length of 1-4:

int seekPointer = 0; RandomAccessFile source; // initialise in your own way public void seek(int shift) < seekPointer += shift; if (seekPointer < 0) seekPointer = 0; try < source.seek(seekPointer); >catch (IOException e) < e.printStackTrace(); >> private int byteCheck(int chr) < if (chr == -1) return 1; // eof int i = 1; // theres always atleast one byte if (chr >= 192) i++; // 2 bytes if (chr >= 224) i++; // 3 bytes if (chr >= 240) i++; // 4 bytes if (chr >= 128 && chr public char nextChar() < try < seekPointer++; int i = source.read(); if (byteCheck(i) == -1) < boolean malformed = true; for (int k = 0; k < 4; k++) < // Iterate 3 times. // we only iterate 3 times because the maximum size of a utf-8 char is 4 bytes. // any further and we may possibly interrupt the other chars. seek(-1); i = source.read(); if (byteCheck(i) != -1) < malformed = false; break; >> if (malformed) < seek(3); throw new UTFDataFormatException("Malformed UTF char at position: " + seekPointer); >> byte[] chrs = new byte[byteCheck(i)]; chrs[0] = (byte) i; for (int j = 1; j < chrs.length; j++) < seekPointer++; chrs[j] = (byte) source.read(); >return i > -1 ? new String(chrs, Charset.forName("UTF-8")).charAt(0) : '\0'; // EOF character is -1. > catch (IOException e) < e.printStackTrace(); >return '\0'; >

This is probably about right. You should decide what you want to do if the byte starts with 10 (in other words >= 128). In that case you’re looking at a byte in the middle of a character and should either back up or read forward until you find a starting byte.

@WillisBlackburn Well the way I designed my program, I won’t actually need it, but its going to be a good learning curve so I’ll go do that now!

@WillisBlackburn Already have. You got a few downvotes. Ill accept your answer too though, because without it I would be stuck. Thankyou very much.

I enjoyed answering your question because UTF-8 is such an elegant character-encoding solution and it’s fun to explain how it works. It can read ASCII directly, it’s as efficient as ASCII for encoding characters in the ASCII set, and the reader can distinguish initial from subsequent bytes in multibyte characters. Supposedly Ken Thompson designed it on a placemat at a diner in New Jersey.

I’m not entirely sure what you’re trying to do, but let me give you some information that might help.

The UTF-8 encoding represents characters as either 1, 2, 3, or 4 bytes depending on the Unicode value of the character.

For characters 0x00-0x7F, UTF-8 encodes the character as a single byte. This is a very useful property because if you’re only dealing with 7-bit ASCII characters, the UTF-8 and ASCII encodings are identical.
For characters 0x80-0x7FF, UTF-8 uses 2 bytes: the first byte is binary 110 followed by the 5 high bits of the character, while the second byte is binary 10 followed by the 6 low bits of the character.
The 3- and 4-byte encodings are similar to the 2-byte encoding, except that the first byte of the 3-byte encoding starts with 1110 and the first byte of the 4-byte encoding starts with 11110.
See Wikipedia for all the details.

Now this may seem pretty byzantine but the upshot of it is this: you can read any byte in a UTF-8 file and know whether you’re looking at a standalone character, the first byte of a multibyte character, or one of the other bytes of a multibyte character.

If the byte you read starts with binary 0, you’re looking at a single-byte character. If it starts with 110, 1110, or 11110, then you have the first byte of a multibyte character of 2, 3, or 4 bytes, respectively. If it starts with 10, then it’s one of the subsequent bytes of a multibyte character; scan backwards to find the start of it.

So if you want to let your caller seek to any random position in a file and read the UTF-8 character there, you can just apply the algorithm above to find the first byte of that character (if it’s not the one at the specified position) and then read and decode the value.

See the Java Charset class for a method to decode UTF-8 from the source bytes. There may be easier ways but Charset will work.

Update: This code should handle the 1- and 2-byte UTF-8 cases. Not tested at all, YMMV.

I wouldn’t bother with seekPointer. The RandomAccessFile knows what it is; just call getFilePosition when you need it.

Источник

Читайте также: Get url filename in php