Regex java new line

Split Java String by New Line

I’m trying to split text in a JTextArea using a regex to split the String by \n However, this does not work and I also tried by \r\n|\r|n and many other combination of regexes. Code:

public void insertUpdate(DocumentEvent e) < String split[], docStr = null; Document textAreaDoc = (Document)e.getDocument(); try < docStr = textAreaDoc.getText(textAreaDoc.getStartPosition().getOffset(), textAreaDoc.getEndPosition().getOffset()); >catch (BadLocationException e1) < // TODO Auto-generated catch block e1.printStackTrace(); >split = docStr.split("\\n"); > 

what is the error that you get? Dont say «does not work», that doesnt mean anything. Tell us the error/result you get. That is the first step in debugging code — figure out what the wrong result is, and how your program got to that.

What do you realy want to do? — break lines as they are entered in the JTextArea? — finding where the JTextArea is doing line wraps? — .

21 Answers 21

String lines[] = string.split("\\r?\\n"); 

There’s only really two newlines (UNIX and Windows) that you need to worry about.

A JTextArea document SHOULD use only ‘\n’; its Views completely ignore ‘\r’. But if you’re going to look for more than one kind of separator, you might as well look for all three: «\r?\n|\r».

@antak yes, split by default removes trailing empty strings if they ware result of split. To turn this mechanism off you need to use overloaded version of split(regex, limit) with negative limit like text.split(«\\r?\\n», -1) . More info: Java String split removed empty values

Читайте также:  METANIT.COM

String[] lines = string.split(System.getProperty(«line.separator»)); This will work fine while you use strings generated in your same OS/app, but if for example you are running your java application under linux and you retrieve a text from a database that was stored as a windows text, then it could fail.

The comment by @stivlo is misinformation, and it is unfortunate that it has so many upvotes. As @ Raekye pointed out, OS X (now known as macOS) has used \n as its line separator since it was released in 2001. Mac OS 9 was released in 1999, and I have never seen a Mac OS 9 or below machine used in production. There is not a single modern operating system that uses \r as a line separator. NEVER write code that expects \r to be the line separator on Mac, unless a) you’re into retro computing, b) have an OS 9 machine spun up, and c) can reliably determine that the machine is actually OS 9.

Источник

RegEx in Java: how to deal with newline

I am currently trying to learn how to use regular expressions so please bear with my simple question. For example, say I have an input file containing a bunch of links separated by a newline:

www.foo.com/Archives/monkeys.htm
Description of Monkey’s website. www.foo.com/Archives/pigs.txt
Description of Pig’s website. www.foo.com/Archives/kitty.txt
Description of Kitty’s website. www.foo.com/Archives/apple.htm
Description of Apple’s website.

If I wanted to get one website along with its description, this regex seems to work on a testing tool: .*www.*\\s.*Pig.* However, when I try running it within my code it doesn’t seem to work. Is this expression correct? I tried replacing «\s» with «\n» and it doesn’t seem to work still.

Just to remind of potentially simpler solutions: For my own case with explicit \n ‘s, even with the suggestions of Pattern.DOTALL / (?s) and double-escaping (\\) as noted below, I found this fiddly enough to just fall back to the non-regexp string methods. str.contains(«\n») worked fine. str.replaceAll(«\n», replacement) worked as well. I couldn’t find variant of String.matches or Pattern.compile that returned true, though, in Java 11. (Unlike solutions below, this won’t help if you need to catch various kinds of newlines.)

6 Answers 6

The lines are probably separated by \r\n in your file. Both \r (carriage return) and \n (linefeed) are considered line-separator characters in Java regexes, and the . metacharacter won’t match either of them. \s will match those characters, so it consumes the \r , but that leaves .* to match the \n , which fails. Your tester probably used just \n to separate the lines, which was consumed by \s .

If I’m right, changing the \s to \s+ or [\r\n]+ should get it to work. That’s probably all you need to do in this case, but sometimes you have to match exactly one line separator, or at least keep track of how many you’re matching. In that case you need a regex that matches exactly one of any of the three most common line separator types: \r\n (Windows/DOS), \n (Unix/Linus/OSX) and \r (older Macs). Either of these will do:

Update: As of Java 8 we have another option, \R . It matches any line separator, including not just \r\n , but several others as defined by the Unicode standard. It’s equivalent to this:

\r\n|[\n\x0B\x0C\r\u0085\u2028\u2029] 

Here’s how you might use it:

The i option makes it case-insensitive, and the m puts it in multiline mode, allowing ^ and $ to match at line boundaries.

Источник

How to specify new line OR the end of string in regex?

My Pattern does not work well for all strings because in case it gets to the string that has no further new line \n it would throw an exception. How can I modify (?:L.*?)\\n so that it will match until \\n OR the end of the String?

Pattern patternL = Pattern.compile("(?:L of .*?)\\n", Pattern.DOTALL); Matcher matcherL = patternL.matcher(text); matcherL.find(); 

3 Answers 3

Simple use: (?:\\n|$) , so your regular expression becomes:

Pattern patternL = Pattern.compile("(?:L of .*?)(?:\\n|$)", Pattern.DOTALL); 

@hwnd It isn’t. I though the OP didn’t want to use the entire word to keep the code small and I did that too.

For Java, this is what you need to match the end of the line or a single LF character:

If you want to be precise about the line break, and include CRLF too

@Sniffer’s answer on matching line break or end of line is correct, but from the code that you posted above (?:L of .*?) , this will not match for Location or for that matter any word, except for the letter L

Pattern patternL = Pattern.compile("Location of .*?(?:\\n|$)", Pattern.DOTALL); 

Pattern.MULTILINE tells Java to accept the anchors ^ and $ to match at the start and end of each line (otherwise they only match at the start/end of the entire string).

Pattern patternL = Pattern.compile("^Location of .*", Pattern.MULTILINE); 

I went from a non-greedy to greedy match above to match the most amount possible, using a non greedy match will match the least amount possible unless you use an end of line anchor $

Источник

New line and dollar sign in Java regular expression

I know $ is used to check if a line end follows in a Java regular expression. For the following codes:

String test_domain = "http://www.google.com/path\nline2\nline3"; test_domain = test_domain.replaceFirst("(\\.[^:/]+).*$?", "$1"); System.out.println(test_domain); 
http://www.google.com line2 line3 

I assume that the pattern (\\.[^:/]+).*$? matches the first line, which is http://www.google.com/path , and the $1 is http://www.google.com . The ? makes a reluctant match (so matches the first line.) However, if I remove the ? in the pattern and implement following codes:

String test_domain = "http://www.google.com/path\nline2\nline3"; test_domain = test_domain.replaceFirst("(\\.[^:/]+).*$", "$1"); System.out.println(test_domain); 
http://www.google.com/path line2 line3 

Where is my misunderstanding of the regex here?

2 Answers 2

You have a multiline input and trying to use anchor $ in your regex for each line but not using MULTILINE flag. All you need is (?m) mode in front of your regex:

String test_domain = "http://www.google.com/path\nline2\nline3"; test_domain = test_domain.replaceFirst("(?m)(\\.[^:/]+).*$", "$1"); System.out.println(test_domain); 
http://www.google.com line2 line3 

Without MULTILINE or DOTALL modes your regex: (\.[^:/]+).*$ will fail to match the input due to presence of .*$ since dot will not match newlines and $ (end of line) is present after 2 newlines.

Your regex does not match the input string. In fact, $ matches the end of string (at the end of line3 ). Since you are not using an s flag, the . cannot get there.

NOTE! that the $ anchor — even without Pattern.MULTILINE option — can match a position before the final line feed char, see What is the difference between ^ and \A , $ and \Z in regex?. This can be easily tested with «a\nb\n».replaceAll(«$», «X») , resulting in «a\nbX\nX» , see this Java demo.

More, the $ end of line/string anchor cannot have ? quantifier after it. It makes no sense for the regex engine, and is ignored in Java.

To make it work at all, you need to use s flag if you want to just return http://www.google.com :

String test_domain = "http://www.google.com/path\nline2\nline3"; test_domain = test_domain.replaceFirst("(?s)(\\.[^:/]+).*$", "$1"); System.out.println(test_domain); 

With a multiline (?m) flag, the regex will process each line looking for a literal . and then a sequence of characters other than : and / . When one of these characters is found, the rest of characters on that line will be omitted.

 String test_domain = "http://www.google.com/path\nline2\nline3"; test_domain = test_domain.replaceFirst("(?m)(\\.[^:/]+).*$", "$1"); System.out.println(test_domain); 
http://www.google.com line2 line3 

Источник

Оцените статью