Regular expressions python replace

Содержание

Python Regex Replace Pattern in a string using re.sub()
Table of contents
How to use re.sub() method
Regex example to replace all whitespace with an underscore
Regex to remove whitespaces from a string
Substitute multiple whitespaces with single whitespace using regex
Limit the maximum number of pattern occurrences to be replaced
Regex replacement function
Regex replace group/multiple regex patterns
Replace multiple regex patterns with different replacement
RE’s subn() method

Python Regex Replace Pattern in a string using re.sub()

In this article, will learn how to use regular expressions to perform search and replace operations on strings in Python.

Python regex offers sub() the subn() methods to search and replace patterns in a string. Using these methods we can replace one or more occurrences of a regex pattern in the target string with a substitute string.

After reading this article you will able to perform the following regex replacement operations in Python.

Operation	Description
re.sub(pattern, replacement, string)	Find and replaces all occurrences of pattern with replacement
re.sub(pattern, replacement, string, count=1)	Find and replaces only the first occurrences of pattern with replacement
re.sub(pattern, replacement, string, count=n)	Find and replaces first n occurrences of pattern with the replacement

Python regex replace operations

Before moving further, let’s see the syntax of the sub() method.

How to use re.sub() method

To understand how to use the re.sub() for regex replacement, we first need to understand its syntax.

Syntax of re.sub()

re.sub(pattern, replacement, string[, count, flags])

The regular expression pattern, replacement, and target string are the mandatory arguments. The count and flags are optional.

pattern : The regular expression pattern to find inside the target string.
replacement: The replacement that we are going to insert for each occurrence of a pattern. The replacement can be a string or function.
string : The variable pointing to the target string (In which we want to perform the replacement).
count : Maximum number of pattern occurrences to be replaced. The count must always be a positive integer if specified. .By default, the count is set to zero, which means the re.sub() method will replace all pattern occurrences in the target string.
flags : Finally, the last argument is optional and refers to regex flags. By default, no flags are applied.
There are many flag values we can use. For example, the re.I is used for performing case-insensitive searching and replacing.

Return value

It returns the string obtained by replacing the pattern occurrences in the string with the replacement string. If the pattern isn’t found, the string is returned unchanged.

Regex example to replace all whitespace with an underscore

Now, let’s see how to use re.sub() with the help of a simple example. Here, we will perform two replacement operations

Let’s see the first scenario first.

Pattern to replace: \s

In this example, we will use the \s regex special sequence that matches any whitespace character, short for [ \t\n\x0b\r\f]

Let’s assume you have the following string and you wanted to replace all the whitespace with an underscore.

target_string = "Jessa knows testing and machine learning"

import re target_str = "Jessa knows testing and machine learning" res_str = re.sub(r"\s", "_", target_str) # String after replacement print(res_str) # Output 'Jessa_knows_testing_and_machine_learning'

Regex to remove whitespaces from a string

Now, let’s move to the second scenario, where you can remove all whitespace from a string using regex. This regex remove operation includes the following four cases.

Remove all spaces, including single or multiple spaces ( pattern to remove \s+ )
Remove leading spaces ( pattern to remove ^\s+ )
Remove trailing spaces ( pattern to remove \s+$ )
Remove both leading and trailing spaces. (pattern to remove ^\s+|\s+$ )

Example 1: Remove all spaces

import re target_str = " Jessa Knows Testing And Machine Learning \t ." # \s+ to remove all spaces # + indicate 1 or more occurrence of a space res_str = re.sub(r"\s+", "", target_str) # String after replacement print(res_str) # Output 'JessaKnowsTestingAndMachineLearning.'

Example 2: Remove leading spaces

import re target_str = " Jessa Knows Testing And Machine Learning \t ." # ^\s+ remove only leading spaces # caret (^) matches only at the start of the string res_str = re.sub(r"^\s+", "", target_str) # String after replacement print(res_str) # Output 'Jessa Knows Testing And Machine Learning .'

Example 3: Remove trailing spaces

import re target_str = " Jessa Knows Testing And Machine Learning \t\n" # ^\s+$ remove only trailing spaces # dollar ($) matches spaces only at the end of the string res_str = re.sub(r"\s+$", "", target_str) # String after replacement print(res_str) # Output ' Jessa Knows Testing And Machine Learning'

Example 4: Remove both leading and trailing spaces

import re target_str = " Jessa Knows Testing And Machine Learning \t\n" # ^\s+ remove leading spaces # ^\s+$ removes trailing spaces # | operator to combine both patterns res_str = re.sub(r"^\s+|\s+$", "", target_str) # String after replacement print(res_str) # Output 'Jessa Knows Testing And Machine Learning'

Substitute multiple whitespaces with single whitespace using regex

import re target_str = "Jessa Knows Testing And Machine Learning \t \n" # \s+ to match all whitespaces # replace them using single space " " res_str = re.sub(r"\s+", " ", target_str) # string after replacement print(res_str) # Output 'Jessa Knows Testing And Machine Learning'

Limit the maximum number of pattern occurrences to be replaced

As I told you, the count argument of the re.sub() method is optional. The count argument will set the maximum number of replacements that we want to make inside the string. By default, the count is set to zero, which means the re.sub() method will replace all pattern occurrences in the target string.

Replaces only the first occurrences of a pattern

By setting the count=1 inside a re.sub() we can replace only the first occurrence of a pattern in the target string with another string.

Replaces the n occurrences of a pattern

Set the count value to the number of replacements you want to perform.

import re # original string target_str = "Jessa knows testing and machine learning" # replace only first occurrence res_str = re.sub(r"\s", "-", target_str, count=1) # String after replacement print(res_str) # Output 'Jessa-knows testing and machine learning' # replace three occurrence res_str = re.sub(r"\s", "-", target_str, count=3) print(res_str) # Output 'Jessa-knows-testing-and machine learning'

Regex replacement function

We saw how to find and replace the regex pattern with a fixed string in the earlier example. In this example, we see how to replace a pattern with an output of a function.

For example, you want to replace all uppercase letters with a lowercase letter. To achieve this we need the following two things

A regular expression pattern that matches all uppercase letters
and the replacement function will convert matched uppercase letters to lowercase.

Pattern to replace: [A-Z]

This pattern will match any uppercase letters inside a target string.

replacement function

You can pass a function to re.sub . When you execute re.sub() your function will receive a match object as the argument. If can perform replacement operation by extracting matched value from a match object.

If a replacement is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument and returns the replacement string

So in our case, we will do the followings

First, we need to create a function to replace uppercase letters with a lowercase letter
Next, we need to pass this function as the replacement argument to the re.sub()
Whenever re.sub() matches the pattern, It will send the corresponding match object to the replacement function
Inside a replacement function, we will use the group() method to extract an uppercase letter and convert it into a lowercase letter

import re # replacement function to convert uppercase letter to lowercase def convert_to_lower(match_obj): if match_obj.group() is not None: return match_obj.group().lower() # Original String str = "Emma LOves PINEAPPLE DEssert and COCONUT Ice Cream" # pass replacement function to re.sub() res_str = re.sub(r"[A-Z]", convert_to_lower, str) # String after replacement print(res_str) # Output 'Emma loves pineapple dessert and coconut Ice Cream'

Regex replace group/multiple regex patterns

We saw how to find and replace the single regex pattern in the earlier examples. In this section, we will learn how to search and replace multiple patterns in the target string.

To understand this take the example of the following string

student_names = «Emma-Kelly Jessa Joy Scott-Joe Jerry»

Here, we want to find and replace two distinct patterns at the same time.

We want to replace each whitespace and hyphen(-) with a comma (,) inside the target string. To achieve this, we must first write two regular expression patterns.

import re # Original string student_names = "Emma-Kelly Jessa Joy Scott-Joe Jerry" # replace two pattern at the same time # use OR (|) to separate two pattern res = re.sub(r"(\s)|(-)", ",", student_names) print(res) # Output 'Emma,Kelly,Jessa,Joy,Scott,Joe,Jerry'

Replace multiple regex patterns with different replacement

To understand this take the example of the following string

target_string = «EMMA loves PINEAPPLE dessert and COCONUT ice CREAM»

The above string contains a combination of uppercase and lowercase words.

Here, we want to match and replace two distinct patterns with two different replacements.

Replace each uppercase word with a lowercase
And replace each lowercase word with uppercase

So we will first capture two groups and then replace each group with a replacement function. If you don’t know the replacement function please read it here.

Group 1: ([A-Z]+)

To capture and replace all uppercase word with a lowercase.
[A-Z] character class means, any character from the capital A to capital Z in uppercase exclusively.

Group 2: ([a-z]+)

To capture and replace all lowercase word with an uppercase
[a-z] character class means, match any character from the small case a to z in lowercase exclusively.

Note: Whenever you wanted to capture groups always write them in parenthesis ( , ) .

import re # replacement function to convert uppercase word to lowercase # and lowercase word to uppercase def convert_case(match_obj): if match_obj.group(1) is not None: return match_obj.group(1).lower() if match_obj.group(2) is not None: return match_obj.group(2).upper() # Original String str = "EMMA loves PINEAPPLE dessert and COCONUT ice CREAM" # group 1 [A-Z]+ matches uppercase words # group 2 [a-z]+ matches lowercase words # pass replacement function 'convert_case' to re.sub() res_str = re.sub(r"([A-Z]+)|([a-z]+)", convert_case, str) # String after replacement print(res_str) # Output 'emma LOVES pineapple DESSERT AND coconut ICE cream'

RE’s subn() method

The re.subn() method is the new method, although it performs the same task as the re.sub() method, the result it returns is a bit different.

The re.subn() method returns a tuple of two elements.

The first element of the result is the new version of the target string after all the replacements have been made.
The second element is the number of replacements it has made

Let’s test this using the same example as before and only replacing the method.

import re target_string = "Emma loves PINEAPPLE, COCONUT, BANANA ice cream" result = re.subn(r"[A-Z]", "MANGO", target_string) print(result) # Output ('Emma loves MANGO, MANGO, MANGO ice cream', 3)

Note: Note: I haven’t changed anything in the regular expression pattern, and the resulting string is the same, only that this time it is included in a tuple as the first element of that tuple. Then after the comma, we have the number of replacements being made, and that is three.

We can also use the count argument of the subn() method. So the value of the second element of the result tuple should change accordingly.

import re target_string = "Emma loves PINEAPPLE, COCONUT, BANANA ice cream" result = re.subn(r"[A-Z]", "MANGO", target_string, count=2) print(result) # Output ('Emma loves MANGO, MANGO, BANANA ice cream', 2)

Источник