- Regex – Match Any Character(s)
- Regular expressions in Java — Tutorial
- 1.2. Regex examples
- 2. Prerequisites
- 3. Rules of writing regular expressions
- 3.1. Common matching symbols
- 3.2. Meta characters
- 3.3. Quantifier
- 3.4. Grouping and back reference
- 3.5. Negative look ahead
- 3.6. Specifying modes inside the regular expression
- 3.7. Backslashes in Java
- 4. Using regular expressions with String methods
- 4.1. Redefined methods on String for processing regular expressions
- 4.2. Examples
Regex – Match Any Character(s)
In regular expressions, we can match any character using period «.» character. To match multiple characters or a given set of characters, we should use character classes.
1. Matching a Single Character Using Regex
By default, the ‘.’ dot character in a regular expression matches a single character without regard to what character it is. The matched character can be an alphabet, a number or, any special character.
To create more meaningful patterns, we can combine the dot character with other regular expression constructs.
Pattern | Description |
---|---|
. (Dot) | Matches only a single character. |
A.B | Matches only a single character at second place in a 3 character long string where the string starts with ‘A’ and ends with ‘B’. |
[abc] | Matches only a single character from a set of given characters. |
[aA] | Matches only a single character ‘a’, case-insensitive. |
import java.util.regex.Pattern; public class Main < public static void main(String[] args) < Pattern.compile(".").matcher("a").matches(); //true Pattern.compile(".").matcher("ab").matches(); //false Pattern.compile("A.B").matcher("AIB").matches(); //true Pattern.compile("A.B").matcher("ABI").matches(); //false Pattern.compile("A[abc]B").matcher("AaB").matches(); //true Pattern.compile("A[abc]B").matcher("AkB").matches(); //false >>
2. Matching Range of Characters
If we want to match a range of characters at any place, we need to use character classes with a hyphen between the range. e.g. ‘[a-f]’ will match a single character which can be either of ‘a’, ‘b’, ‘c’, ‘d’, ‘e’ or ‘f’.
Pattern | Description |
---|---|
[a-f] | Matches only a single character in the range from ‘a’ to ‘f’. |
[a-z] | Matches only a single lowercase character in the range from ‘a’ to ‘z’. |
[A-Z] | Matches only a single uppercase character in the range from ‘A’ to ‘Z’. |
[a-zA-Z] | Matches only a single character in the range from ‘a’ to ‘z’, case-insensitive. |
1 | Matches only a single number in the range from ‘0’ to ‘9’. |
import java.util.regex.Pattern; public class Main < public static void main(String[] args) < System.out.println(Pattern.compile("[a-f]").matcher("b").matches()); //true System.out.println(Pattern.compile("[a-f]").matcher("g").matches()); //false System.out.println(Pattern.compile("[a-zA-Z]").matcher("a").matches()); //true System.out.println(Pattern.compile("[a-zA-Z]").matcher("B").matches()); //true System.out.println(Pattern.compile("[a-zA-Z]").matcher("4").matches()); //false System.out.println(Pattern.compile("4").matcher("9").matches()); //true System.out.println(Pattern.compile("6").matcher("91").matches()); //false >>
3. Matching Multiple Characters
If we want to match a set of characters at any place then we need to use a wild card character ‘ * ‘ (asterisk) which matches 0 or more characters.
Pattern | Description |
---|---|
.* | Matches any number of characters including special characters. |
3* | Matches any number of digits. |
[a-zA-Z]* | Matches any number of alphabets. |
[a-zA-Z0-9]* | Matches any number of alphanumeric characters. |
Pattern.compile(".*").matcher("abcd").matches(); //true Pattern.compile("[a-zA-Z]*").matcher("abcd").matches(); //true Pattern.compile("5*").matcher("01234").matches(); //true Pattern.compile("[a-zA-Z0-9]*").matcher("a1b2c3").matches(); //true
Regular expressions in Java — Tutorial
A regular expression (regex) defines a search pattern for strings. The search pattern can be anything from a simple character, a fixed string or a complex expression containing special characters describing the pattern.
A regex can be used to search, edit and manipulate text, this process is called: The regular expression is applied to the text/string.
The regex is applied on the text from left to right. Once a source character has been used in a match, it cannot be reused. For example, the regex aba will match ababababa only two times (aba_aba__).
1.2. Regex examples
A simple example for a regular expression is a (literal) string. For example, the Hello World regex matches the «Hello World» string. . (dot) is another example for a regular expression. A dot matches any single character; it would match, for example, «a» or «1».
The following tables lists several regular expressions and describes which pattern they would match.
Matches exactly «this is text»
Matches the word «this» followed by one or more whitespace characters followed by the word «is» followed by one or more whitespace characters followed by the word «text».
^ defines that the patter must start at beginning of a new line. \d+ matches one or several digits. The ? makes the statement in brackets optional. \. matches «.», parentheses are used for grouping. Matches for example «5», «1.5» and «2.21».
2. Prerequisites
The following tutorial assumes that you have basic knowledge of the Java programming language.
Some of the following examples use JUnit Tutorial to validate the result. You should be able to adjust them in case if you do not want to use JUnit.
3. Rules of writing regular expressions
The following description is an overview of available meta characters which can be used in regular expressions. This chapter is supposed to be a references for the different regex elements.
3.1. Common matching symbols
Finds regex that must match at the beginning of the line.
Finds regex that must match at the end of the line.
Set definition, can match the letter a or b or c.
Set definition, can match a or b or c followed by either v or z.
When a caret appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.
Ranges: matches a letter between a and d and figures from 1 to 7, but not d1.
Finds X directly followed by Z.
Checks if a line end follows.
3.2. Meta characters
The following meta characters have a pre-defined meaning and make certain common patterns easier to use. For example, you can use \d as simplified definition for [0..9] .
A whitespace character, short for [ \t\n\x0b\r\f]
A non-whitespace character, short for
A word character, short for [a-zA-Z_0-9]
Several non-whitespace characters
Matches a word boundary where a word character is [a-zA-Z0-9_]
These meta characters have the same first letter as their representation, e.g., digit, space, word, and boundary. Uppercase symbols define the opposite. |
3.3. Quantifier
A quantifier defines how often an element can occur. The symbols ?, *, + and <> are qualifiers.
Occurs zero or more times, is short for
X* finds no or several letter X, .* finds any character sequence
Occurs one or more times, is short for
X+ — Finds one or several letter X
Occurs no or one times, ? is short for .
X? finds no or exactly one letter X
Occurs X number of times, <> describes the order of the preceding liberal
\d searches for three digits, . for any character sequence of length 10.
Occurs between X and Y times,
\d means \d must occur at least once and at a maximum of four.
? after a quantifier makes it a reluctant quantifier. It tries to find the smallest match. This makes the regular expression stop at the first match.
3.4. Grouping and back reference
You can group parts of your regular expression. In your pattern you group elements with round brackets, e.g., () . This allows you to assign a repetition operator to a complete group.
In addition these groups also create a back reference to the part of the regular expression. This captures the group. A back reference stores the part of the String which matched the group. This allows you to use this part in the replacement.
Via the $ you can refer to a group. $1 is the first group, $2 the second, etc.
Let’s, for example, assume you want to replace all whitespace between a letter followed by a point or a comma. This would involve that the point or the comma is part of the pattern. Still it should be included in the result.
// Removes whitespace between a word character and . or , String pattern = "(\\w)(\\s+)([\\.,])"; System.out.println(EXAMPLE_TEST.replaceAll(pattern, "$1$3"));
This example extracts the text between a title tag.
// Extract the text between the two title elements pattern = "(?i)()(.+?)()"; String updated = EXAMPLE_TEST.replaceAll(pattern, "$2");
3.5. Negative look ahead
Negative look ahead provides the possibility to exclude a pattern. With this you can say that a string should not be followed by another string.
Negative look ahead are defined via (?!pattern) . For example, the following will match «a» if «a» is not followed by «b».
3.6. Specifying modes inside the regular expression
You can add the mode modifiers to the start of the regex. To specify multiple modes, simply put them together as in (?ismx).
- (?i) makes the regex case insensitive.
- (?s) for «single line mode» makes the dot match all characters, including line breaks.
- (?m) for «multi-line mode» makes the caret and dollar match at the start and end of each line in the subject string.
3.7. Backslashes in Java
The backslash \ is an escape character in Java Strings. That means backslash has a predefined meaning in Java. You have to use double backslash \\ to define a single backslash. If you want to define \w , then you must be using \\w in your regex. If you want to use backslash as a literal, you have to type \\\\ as \ is also an escape character in regular expressions.
4. Using regular expressions with String methods
4.1. Redefined methods on String for processing regular expressions
Strings in Java have built-in support for regular expressions. Strings have four built-in methods for regular expressions: * matches() , * split()) , * replaceFirst() * replaceAll()
The replace() method does NOT support regular expressions.
These methods are not optimized for performance. We will later use classes which are optimized for performance.
Evaluates if «regex» matches s . Returns only true if the WHOLE string can be matched.
Creates an array with substrings of s divided at occurrence of «regex» . «regex» is not included in the result.
Replaces first occurance of «regex» with «replacement .
Replaces all occurances of «regex» with «replacement .
Create for the following example the Java project de.vogella.regex.test .
package de.vogella.regex.test; public class RegexTestStrings public static final String EXAMPLE_TEST = "This is my small example " + "string which I'm going to " + "use for pattern matching."; public static void main(String[] args) System.out.println(EXAMPLE_TEST.matches("\\w.*")); String[] splitString = (EXAMPLE_TEST.split("\\s+")); System.out.println(splitString.length);// should be 14 for (String string : splitString) System.out.println(string); > // replace all whitespace with tabs System.out.println(EXAMPLE_TEST.replaceAll("\\s+", "\t")); > >
4.2. Examples
The following class gives several examples for the usage of regular expressions with strings. See the comment for the purpose.
If you want to test these examples, create for the Java project de.vogella.regex.string .
package de.vogella.regex.string; public class StringMatcher // returns true if the string matches exactly "true" public boolean isTrue(String s) return s.matches("true"); > // returns true if the string matches exactly "true" or "True" public boolean isTrueVersion2(String s) return s.matches("[tT]rue"); > // returns true if the string matches exactly "true" or "True" // or "yes" or "Yes" public boolean isTrueOrYes(String s) return s.matches("[tT]rue|[yY]es"); > // returns true if the string contains exactly "true" public boolean containsTrue(String s) return s.matches(".*true.*"); > // returns true if the string contains of three letters public boolean isThreeLetters(String s) return s.matches("[a-zA-Z]"); // simpler from for // return s.matches("[a-Z][a-Z][a-Z]"); > // returns true if the string does not have a number at the beginning public boolean isNoNumberAtBeginning(String s) return s.matches("^[^\\d].*"); > // returns true if the string contains a arbitrary number of characters except b public boolean isIntersection(String s) return s.matches("([\\w&&[^b]])*"); > // returns true if the string contains a number less than 300 public boolean isLessThenThreeHundred(String s) return s.matches("[^0-9]*[12]?5[^0-9]*"); > >
And a small JUnit Test to validates the examples.
package de.vogella.regex.string; import org.junit.Before; import org.junit.Test; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertTrue; public class StringMatcherTest private StringMatcher m; @Before public void setup() m = new StringMatcher(); > @Test public void testIsTrue() assertTrue(m.isTrue("true")); assertFalse(m.isTrue("true2")); assertFalse(m.isTrue("True")); > @Test public void testIsTrueVersion2() assertTrue(m.isTrueVersion2("true")); assertFalse(m.isTrueVersion2("true2")); assertTrue(m.isTrueVersion2("True"));; > @Test public void testIsTrueOrYes() assertTrue(m.isTrueOrYes("true")); assertTrue(m.isTrueOrYes("yes")); assertTrue(m.isTrueOrYes("Yes")); assertFalse(m.isTrueOrYes("no")); > @Test public void testContainsTrue() assertTrue(m.containsTrue("thetruewithin")); > @Test public void testIsThreeLetters() assertTrue(m.isThreeLetters("abc")); assertFalse(m.isThreeLetters("abcd")); > @Test public void testisNoNumberAtBeginning() assertTrue(m.isNoNumberAtBeginning("abc")); assertFalse(m.isNoNumberAtBeginning("1abcd")); assertTrue(m.isNoNumberAtBeginning("a1bcd")); assertTrue(m.isNoNumberAtBeginning("asdfdsf")); > @Test public void testisIntersection() assertTrue(m.isIntersection("1")); assertFalse(m.isIntersection("abcksdfkdskfsdfdsf")); assertTrue(m.isIntersection("skdskfjsmcnxmvjwque484242")); > @Test public void testLessThenThreeHundred() assertTrue(m.isLessThenThreeHundred("288")); assertFalse(m.isLessThenThreeHundred("3288")); assertFalse(m.isLessThenThreeHundred("328 8")); assertTrue(m.isLessThenThreeHundred("1")); assertTrue(m.isLessThenThreeHundred("99")); assertFalse(m.isLessThenThreeHundred("300")); > >