Brackets in python regex

regex to get all text outside of brackets

but I’m having trouble getting anything outside of the square brackets. I’ve tried something like the following:

names = re.findall(r"(.*?)\[.*\]+", example_str) 

So far I’ve only seen a string containing one to two name [address] combos, but I’m assuming there could be any number of them in a string.

4 Answers 4

If there are no nested brackets, you can just do this:

However, you don’t even really need a regex here. Just split on brackets:

(s.split(']')[-1] for s in example_str.split('[')) 

The only reason your attempt didn’t work:

… is that you were doing a non-greedy match within the brackets, which means it was capturing everything from the first open bracket to the last close bracket, instead of capturing just the first pair of brackets.

Also, the + on the end seems wrong. If you had ‘abc [def][ghi] jkl[mno]’ , would you want to get back [‘abc ‘, », ‘ jkl’] , or [‘abc ‘, ‘ jkl’] ? If the former, don’t add the + . If it’s the latter, do—but then you need to put the whole bracketed pattern in a non-capturing group: r'(.*?)(?:\[.*?\])+ .

If there might be additional text after the last bracket, the split method will work fine, or you could use re.split instead of re.findall … but if you want to adjust your original regex to work with that, you can.

Читайте также:  Цикл сортировки на питоне

In English, what you want is any (non-greedy) substring before a bracket-enclosed substring or the end of the string, right?

So, you need an alternation between \[.*?\] and $ . Of course you need to group that in order to write the alternation, and you don’t want to capture the group. So:

Источник

Regular Expression to find brackets in a string

Oh. When I do findall it gives me a list of matches which I can then join. Search and match only give the first match is it right? It wont go further and check for other matches? And why is that? I used «+» for that only so that it can check for one or more.

+ means 1 or more matching characters in a row. If there are non-matching characters between groups of matching characters, re.search only finds the first group, while re.match only finds the first group and then only if it’s at the beginning of the string.

4 Answers 4

You have to escape the first closing square bracket.

To combine all of them into a string, you can search for anything that doesn’t match and remove it.

Use the following (Closing square bracket must be escaped inside character class):

Hi Thanks for your comment. But I am not able to understand why I need to escape the last square bracket. Shouldnt the character class searches for any of those character and + would make it one or more?

@user you are right.. but how will regex know which square bracket is the closing one? inner one? or outer one. thats why you need to escape the inner one.. hope you got my point..

Yup got that. But why search only fetches one match? I am using «+» so shouldnt it fetch one or more?

@user + means one or more of the specified pattern.. so the regex [()<>[\]]+ would match < or <] or (<]<)>etc.. but only the first occurance of the match is returned.. to get all the matches.. you have to use re.findAll

The regular expression «[()<>[]]+» (or rather «[]()<>[]+» or «[()<>[\]]+» (as others have suggested)) finds a sequence of consecutive characters. What you need to do is find all of these sequences and join them.

brackets = ''.join(re.findall(r"[]()<>[]+",s)) 

Note also that I rearranged the order of characters in a class, as ] has to be at the beginning of a class so that it is not interpreted as the end of class definition.

Источник

Get the string within brackets in Python

I have a sample string , created=1324336085, description=’Customer for My Test App’, livemode=False> I only want the value cus_Y4o9qMEZAugtnW and NOT card (which is inside another [] ) How could I do it in easiest possible way in Python? Maybe by using RegEx (which I am not good at)?

8 Answers 8

import re s = "alpha.Customer[cus_Y4o9qMEZAugtnW] . " m = re.search(r"\[([A-Za-z0-9_]+)\]", s) print m.group(1) 

Note that the call to re.search(. ) finds the first match to the regular expression, so it doesn’t find the [card] unless you repeat the search a second time.

Edit: The regular expression here is a python raw string literal, which basically means the backslashes are not treated as special characters and are passed through to the re.search() method unchanged. The parts of the regular expression are:

  1. \[ matches a literal [ character
  2. ( begins a new group
  3. [A-Za-z0-9_] is a character set matching any letter (capital or lower case), digit or underscore
  4. + matches the preceding element (the character set) one or more times.
  5. ) ends the group
  6. \] matches a literal ] character

Edit: As D K has pointed out, the regular expression could be simplified to:

since the \w is a special sequence which means the same thing as [a-zA-Z0-9_] depending on the re.LOCALE and re.UNICODE settings.

Источник

Splitting a string with brackets using regular expression in python

Suppose I have a string like str = «[Hi all], [this is] [an example] » . I want to split it into several pieces, each of which consists content inside a pair bracket. In another word, i want to grab the phrases inside each pair of bracket. The result should be like:

['Hi all', 'this is', 'an example'] 

3 Answers 3

data = "[Hi all], [this is] [an example] " import re print re.findall("\[(.*?)\]", data) # ['Hi all', 'this is', 'an example'] 

Regular expression visualization

The efficiency of any given regex/target string combination is best measured directly by benchmarking. However, the book: Mastering Regular Expressions (3rd Edition) provides an in-depth look at precisely how a regex engine goes about its job and culminates with a chapter describing how to write an efficient regex. Once this book is digested, one will never look at a dot-star in the same way again. The book discusses benchmarking as well.

Long story short, with the \[(.*?)\] lazy-dot-star expression the regex engine must backtrack on each and every character, whereas the \[([^[\]]*)\] expression matches the entire contents between the brackets in one gulp (and no backtracking). Note also that some regex engines (e.g. Perl) have optimized code that very efficiently handles lazy quantifiers and in this case, your expression may actually be faster. But only benchmarking will tell for sure.

import re str = "[Hi all], [this is] [an example] " contents = re.findall('\[(.*?)\]', str) 

I’ve run into this problem a few times — Regular expressions will work unless you have nested brackets. In the more general case where you might have nested brackets, the following will work:

 def bracketed_split(string, delimiter, strip_brackets=False): """ Split a string by the delimiter unless it is inside brackets. e.g. list(bracketed_split('abc,(def,ghi),jkl', delimiter=',')) == ['abc', '(def,ghi)', 'jkl'] """ openers = '[<(<' closers = ']>)>' opener_to_closer = dict(zip(openers, closers)) opening_bracket = dict() current_string = '' depth = 0 for c in string: if c in openers: depth += 1 opening_bracket[depth] = c if strip_brackets and depth == 1: continue elif c in closers: assert depth > 0, f"You exited more brackets that we have entered in string " assert c == opener_to_closer[opening_bracket[depth]], f"Closing bracket did not match opening bracket in string " depth -= 1 if strip_brackets and depth == 0: continue if depth == 0 and c == delimiter: yield current_string current_string = '' else: current_string += c assert depth == 0, f'You did not close all brackets in string ' yield current_string 
>>> list(bracketed_split("[Hi all], [this is] [an example]", delimiter=' ')) ['[Hi all],', '[this is]', '[an example]'] >>> list(bracketed_split("[Hi all], [this is] [a [nested] example]", delimiter=' ')) ['[Hi all],', '[this is]', '[a [nested] example]'] 

Источник

Regular expression curvy brackets in Python

You can’t do this with python’s regex flavor. Arbitrarily nested structures are beyond it’s capabilities. Walk the string character-by-character, and maintain a depth count which you increase when encountering < and decrease when incountering >. When you get back to 0 take the substring from where you found the first < to there.

@m.buettner is correct. You need to write a parser to parser on the tokens, this case open and close braces, to produce your result. It should be easy to do, given the simplicity of the problem.

I thought that python can do it for me :D, but yes, it won’t be any problem to write parser like that, tnx 🙂

What’s the idea exactly, I don’t completely see the logic in the result you want to get. You do have recusrive backreferences for regexes though so maybe it is possible, but I just don’t completely understand what you want?

4 Answers 4

As @m.buettner points out in the comments, Python’s implementation of regular expressions can’t match pairs of symbols nested to an arbitrary degree. (Other languages can, notably current versions of Perl.) The Pythonic thing to do when you have text that regexs can’t parse is to use a recursive-descent parser.

There’s no need to reinvent the wheel by writing your own, however; there are a number of easy-to-use parsing libraries out there. I recommend pyparsing which lets you define a grammar directly in your code and easily attach actions to matched tokens. Your code would look something like this:

import pyparsing lbrace = Literal('<') rbrace = Literal('>') contents = Word(printables) expr = Forward() expr  

Источник

Оцените статью