Replace regular expression java

Search and replace with regular expressions

It is possible to perform search and replace operations on strings in Java using regular expressions. The Java String and Matcher classes offer relatively simple methods for matching and search/replacing strings which can bring the benefit of string matching optimisations that could be cumbersome to implement from scratch. The complexity of using these methods depends how much flexibility you need:

  • to find and replace instance of one fixed substring with another, we can use String.replaceAll()— we just need to take a little care (see below);
  • to search for and replace instances of a regular expression in a string with a fixed string, then we can generally use a simple call to String.replaceAll();
  • if the replacement string isn’t fixed, then you can use replaceAll() with a lambda expression to specify a dynamic replacement and/or use the Java Pattern and Matcher classes explicitly, giving complete control over the find and replace operation.

Replacing one «fixed» substring with another

This is the «simplest» form of search and replace. We want to find exact instances of a specific subtring and replace them with another given substring. To do so, we can call replaceAll() on the String, but we need to put Pattern.quote() around the substring we are searching for. For example, this will replace all instances of the substring «1+» with «one plus»:

str = str.replaceAll(Pattern.quote("1+"), "one plus");

If you are familiar with regular expressions, then you will know that a plus sign normally has a special meaning. But provided you remember to put Pattern.quote() around the first string, we can use replaceAll() as a simple search and replace call. (If the replacement substring contains a dollar sign or backslash, then we also need to use Matcher.quoteReplacement(): see below.)

Читайте также:  График функции двух переменных питон

Replacing substrings with a fixed string

If you simply want to replace all instances of a given expression within a Java string with another fixed string, then things are fairly straightforward. For example, the following replaces all instances of digits with a letter X:

We’ll see in the next section that we should be careful about passing «raw» strings as the second paramter, since certain characters in this string actually have special meanings.

Replacing with a sub-part of the matched portion

In the replacement string, we can refer to captured groups from the regular expression. For example, the following expression removes instances of the HTML ‘bold’ tag from a string, but leaves the text inside the tag intact:

In the expression ([^<]*), we capture the text between the open and close tags as group 1. Then, in the replacement string, we can refer to the text of group 1 with the expression $1. (The second group would be $2 etc.)

Including a dollar sign or backslashes in the replacement string

To actually include a dollar sign or backslash in the replacement string, we need to put another backslash before the dollar symbol or backslash to «escape» it. remembering that within a string literal, a single backslash also needs to be doubled up! For example:

The static method Matcher.quoteReplacement() will replace instances of dollar signs and backslashes in a given string with the correct form to allow them to be used as literal replacements:

str = str.replaceAll("USD", Matcher.quoteReplacement("$"));
  • If there is a chance that the replacement string will include a dollar sign or a backslash character, then you should wrap it in Matcher.quoteReplacement().

Further information: more flexible find and replacement operations

  • the replaceAll() method can be used with a lambda expression: see the accompanying page and example of using replaceAll() with a lambda expression;
  • Matcher.find() method can be used to provide further control over the operation.

If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants. Follow @BitterCoffey

Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.

Источник

Methods of the Matcher Class

This section describes some additional useful methods of the Matcher class. For convenience, the methods listed below are grouped according to functionality.

Index Methods

Index methods provide useful index values that show precisely where the match was found in the input string:

  • public int start() : Returns the start index of the previous match.
  • public int start(int group) : Returns the start index of the subsequence captured by the given group during the previous match operation.
  • public int end() : Returns the offset after the last character matched.
  • public int end(int group) : Returns the offset after the last character of the subsequence captured by the given group during the previous match operation.

Study Methods

Study methods review the input string and return a boolean indicating whether or not the pattern is found.

  • public boolean lookingAt() : Attempts to match the input sequence, starting at the beginning of the region, against the pattern.
  • public boolean find() : Attempts to find the next subsequence of the input sequence that matches the pattern.
  • public boolean find(int start) : Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index.
  • public boolean matches() : Attempts to match the entire region against the pattern.

Replacement Methods

Replacement methods are useful methods for replacing text in an input string.

  • public Matcher appendReplacement(StringBuffer sb, String replacement) : Implements a non-terminal append-and-replace step.
  • public StringBuffer appendTail(StringBuffer sb) : Implements a terminal append-and-replace step.
  • public String replaceAll(String replacement) : Replaces every subsequence of the input sequence that matches the pattern with the given replacement string.
  • public String replaceFirst(String replacement) : Replaces the first subsequence of the input sequence that matches the pattern with the given replacement string.
  • public static String quoteReplacement(String s) : Returns a literal replacement String for the specified String . This method produces a String that will work as a literal replacement s in the appendReplacement method of the Matcher class. The String produced will match the sequence of characters in s treated as a literal sequence. Slashes ( ‘\’ ) and dollar signs ( ‘$’ ) will be given no special meaning.

Using the start and end Methods

Here’s an example, MatcherDemo.java , that counts the number of times the word «dog» appears in the input string.

import java.util.regex.Pattern; import java.util.regex.Matcher; public class MatcherDemo < private static final String REGEX = "\\bdog\\b"; private static final String INPUT = "dog dog dog doggie dogg"; public static void main(String[] args) < Pattern p = Pattern.compile(REGEX); // get a matcher object Matcher m = p.matcher(INPUT); int count = 0; while(m.find()) < count++; System.out.println("Match number " + count); System.out.println("start(): " + m.start()); System.out.println("end(): " + m.end()); >> >
OUTPUT: Match number 1 start(): 0 end(): 3 Match number 2 start(): 4 end(): 7 Match number 3 start(): 8 end(): 11

You can see that this example uses word boundaries to ensure that the letters «d» «o» «g» are not merely a substring in a longer word. It also gives some useful information about where in the input string the match has occurred. The start method returns the start index of the subsequence captured by the given group during the previous match operation, and end returns the index of the last character matched, plus one.

Using the matches and lookingAt Methods

The matches and lookingAt methods both attempt to match an input sequence against a pattern. The difference, however, is that matches requires the entire input sequence to be matched, while lookingAt does not. Both methods always start at the beginning of the input string. Here’s the full code, MatchesLooking.java :

import java.util.regex.Pattern; import java.util.regex.Matcher; public class MatchesLooking < private static final String REGEX = "foo"; private static final String INPUT = "fooooooooooooooooo"; private static Pattern pattern; private static Matcher matcher; public static void main(String[] args) < // Initialize pattern = Pattern.compile(REGEX); matcher = pattern.matcher(INPUT); System.out.println("Current REGEX is: " + REGEX); System.out.println("Current INPUT is: " + INPUT); System.out.println("lookingAt(): " + matcher.lookingAt()); System.out.println("matches(): " + matcher.matches()); >>
Current REGEX is: foo Current INPUT is: fooooooooooooooooo lookingAt(): true matches(): false

Using replaceFirst(String) and replaceAll(String)

The replaceFirst and replaceAll methods replace text that matches a given regular expression. As their names indicate, replaceFirst replaces the first occurrence, and replaceAll replaces all occurrences. Here’s the ReplaceDemo.java code:

import java.util.regex.Pattern; import java.util.regex.Matcher; public class ReplaceDemo < private static String REGEX = "dog"; private static String INPUT = "The dog says meow. All dogs say meow."; private static String REPLACE = "cat"; public static void main(String[] args) < Pattern p = Pattern.compile(REGEX); // get a matcher object Matcher m = p.matcher(INPUT); INPUT = m.replaceAll(REPLACE); System.out.println(INPUT); >>
OUTPUT: The cat says meow. All cats say meow.

In this first version, all occurrences of dog are replaced with cat . But why stop here? Rather than replace a simple literal like dog , you can replace text that matches any regular expression. The API for this method states that «given the regular expression a*b , the input aabfooaabfooabfoob , and the replacement string — , an invocation of this method on a matcher for that expression would yield the string -foo-foo-foo- .»

import java.util.regex.Pattern; import java.util.regex.Matcher; public class ReplaceDemo2 < private static String REGEX = "a*b"; private static String INPUT = "aabfooaabfooabfoob"; private static String REPLACE = "-"; public static void main(String[] args) < Pattern p = Pattern.compile(REGEX); // get a matcher object Matcher m = p.matcher(INPUT); INPUT = m.replaceAll(REPLACE); System.out.println(INPUT); >>

To replace only the first occurrence of the pattern, simply call replaceFirst instead of replaceAll . It accepts the same parameter.

Using appendReplacement(StringBuffer,String) and appendTail(StringBuffer)

The Matcher class also provides appendReplacement and appendTail methods for text replacement. The following example, RegexDemo.java , uses these two methods to achieve the same effect as replaceAll .

import java.util.regex.Pattern; import java.util.regex.Matcher; public class RegexDemo < private static String REGEX = "a*b"; private static String INPUT = "aabfooaabfooabfoob"; private static String REPLACE = "-"; public static void main(String[] args) < Pattern p = Pattern.compile(REGEX); Matcher m = p.matcher(INPUT); // get a matcher object StringBuffer sb = new StringBuffer(); while(m.find())< m.appendReplacement(sb,REPLACE); >m.appendTail(sb); System.out.println(sb.toString()); > >

Matcher Method Equivalents in java.lang.String

For convenience, the String class mimics a couple of Matcher methods as well:

  • public String replaceFirst(String regex, String replacement) : Replaces the first substring of this string that matches the given regular expression with the given replacement. An invocation of this method of the form str.replaceFirst(regex, repl) yields exactly the same result as the expression Pattern.compile(regex).matcher(str).replaceFirst(repl)
  • public String replaceAll(String regex, String replacement) : Replaces each substring of this string that matches the given regular expression with the given replacement. An invocation of this method of the form str.replaceAll(regex, repl) yields exactly the same result as the expression Pattern.compile(regex).matcher(str).replaceAll(repl)

Источник

Оцените статью