Python regexp non greedy

Python Regex Non-Greedy

Summary: in this tutorial, you’ll learn about the regex non-greedy (or lazy) quantifiers that match their preceding elements as few times as possible.

Introduction to the regex non-greedy (or lazy) quantifiers

Quantifiers allow you to match their preceding elements a number of times. Quantifiers work in one of two modes: greedy and non-greedy (lazy).

When quantifiers work in the greedy mode, they are called greedy quantifiers. Similarly, when quantifiers work in the non-greedy mode, they’re called non-greedy quantifiers or lazy quantifiers.

By default, quantifiers work in the greedy mode. It means the greedy quantifiers will match their preceding elements as much as possible to return to the biggest match possible.

On the other hand, the non-greedy quantifiers will match as little as possible to return the smallest match possible. non-greedy quantifiers are the opposite of greedy ones.

To turn greedy quantifiers into non-greedy quantifiers, you add an extra question mark ( ? ) to the quantifiers. The following table shows the greedy and their corresponding non-greedy quantifiers:

Greedy quantifier Lazy quantifier Meaning
* *? Match its preceding element zero or more times.
+ +? Match its preceding element one or more times.
? ?? Match its preceding element zero or one time.
< n > < n >? Match its preceding element exactly n times.
< n ,> < n ,>? Match its preceding element at least n times.
< n , m > < n , m >? Match its preceding element from n to m times.
Читайте также:  Применение абзацев

Python regex non-greedy quantifiers example

The following program uses the non-greedy quantifier ( +? ) to match the text within the quotes ( «» ) of a button element:

import re s = '' pattern = '".+?"' matches = re.finditer(pattern, s) for match in matches: print(match.group()) Code language: Python (python)
"submit" "btn"Code language: Python (python)

Summary

  • Non-greedy quantifiers match their preceding elements as little as possible to return the smallest possible match.
  • Add a question mark (?) to a quantifier to turn it into a non-greedy quantifier.

Источник

Python Regular Expression – Greedy vs Non Greedy quantifiers

So far we talked about various quantifier in regular expression like Asterisk , Plus , Question mark and curly braces . In this post we will go one step further and try to understand the difference between greedy and non greedy quantifiers.

Greedy Match –

A greedy match in regular expression tries to match as many characters as possible.

For example 8+ will try to match as many digits as possible. It gets never enough of it. It’s too greedy.

In [2]: re.findall('2+', '12345678910') Out[2]: ['12345678910']

By default all quantifiers are greedy. They will try to match as many characters as possible.

In [3]: # zero or more occurrences In [4]: re.findall('1*', '12345678910') Out[4]: ['12345678910', ''] In [5]: # one or more occurrences In [6]: re.findall('1+', '12345678910') Out[6]: ['12345678910'] In [7]: # zero or one occurrences In [8]: re.findall('2?', '12345678910') Out[8]: ['1', '2', '3', '4', '5', '6', '7', '8', '9', '1', '0', ''] In [9]: # exactly 5 occurrences In [10]: re.findall('7', '12345678910') Out[10]: ['12345', '67891'] In [11]: # at least 2 but not greater than 5 In [12]: re.findall('5', '12345678910') Out[12]: ['12345', '67891'] 

Non Greedy Match –

A Non-Greedy match tries match as few characters as possible. You can make default quantifiers *, +, ?, <>, , non greedy by appending a question marks after them like this – *?, +?, . ?, ?

Non Greedy Asterisk (*?) –

In [15]: re.findall('9*', '12345678910') Out[15]: ['12345678910', ''] In [16]: re.findall('3*?', '12345678910') Out[16]: ['', '1', '', '2', '', '3', '', '4', '', '5', '', '6', '', '7', '', '8', '', '9', '', '1', '', '0', '']

The greedy version of asterisk 6* matches zero or more occurrences of the number. The non greedy version of asterisk 3*? matches zero or one occurrences of the number.

Non Greedy Plus (+? ) –

 In [17]: re.findall('9+', '12345678910') Out[17]: ['12345678910'] In [18]: re.findall('3+?', '12345678910') Out[18]: ['1', '2', '3', '4', '5', '6', '7', '8', '9', '1', '0']

The greedy version of plus 8+ matches one or more occurrences of the number. The non greedy version 9+? matches only once occurrences of the number.

Non Greedy Question mark ( ?? ) –

 In [19]: re.findall('7?', '12345678910') Out[19]: ['1', '2', '3', '4', '5', '6', '7', '8', '9', '1', '0', ''] In [20]: re.findall('7??', '12345678910') Out[20]: ['', '1', '', '2', '', '3', '', '4', '', '5', '', '6', '', '7', '', '8', '', '9', '', '1', '', '0', ''] In [21]: 

The greedy version of question mark 3? matches zero or one occurrences of the number. So, it first consumed 1 then 2 then 3 and so on and at last an empty string. The non greedy version of question marks 5?? consumes an empty string, then a number, then again an empty string followed by a number as so on. It is trying to match as few numbers as possible that is why we are seeing this kind of pattern.

Non Greedy curly braces –

In [25]: re.findall('5', '12345678910') Out[25]: ['12345', '67891'] In [26]: re.findall('3?', '12345678910') Out[26]: ['12345', '67891']

The greedy and non greedy versions both matches 5 digits as curly braces matches exactly the specified number of occurrences.

In [27]: re.findall('9', '12345678910') Out[27]: ['12345', '67891'] In [28]: re.findall('8?', '12345678910') Out[28]: ['12', '34', '56', '78', '91']

Here the greedy version matches 5 digits but the non greedy second version matches only two digits.

Share this:

Like this:

Leave a Reply Cancel reply

Newsletter

Tags

To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.

The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.

The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.

The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.

The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.

Источник

Оцените статью