Return substring in python

Extract a substring from a string in Python (position, regex)

This article explains how to extract a substring from a string in Python. You can extract a substring by specifying its position and length, or by using regular expression patterns.

  • Extract a substring by specifying the position and number of characters
    • Extract a character by index
    • Extract a substring by slicing
    • Extract based on the number of characters
    • Wildcard-like patterns
    • Greedy and non-greedy
    • Extract part of the pattern with parentheses
    • Match any single character
    • Match the start/end of the string
    • Extract by multiple patterns
    • Case-insensitive

    To search a string to get the position of a given substring or replace a substring in a string with another string, see the following articles.

    If you want to extract from a text file, read the file as a string.

    Extract a substring by specifying the position and number of characters

    Extract a character by index

    You can get a character at the desired position by specifying an index in [] . Indexes start at 0 (zero-based indexing).

    s = 'abcde' print(s[0]) # a print(s[4]) # e 

    You can specify a backward position with negative values. -1 represents the last character.

    An error is raised if the non-existent index is specified.

    # print(s[5]) # IndexError: string index out of range # print(s[-6]) # IndexError: string index out of range 

    Extract a substring by slicing

    s = 'abcde' print(s[1:3]) # bc print(s[:3]) # abc print(s[1:]) # bcde 

    You can also use negative values.

    print(s[-4:-2]) # bc print(s[:-2]) # abc print(s[-4:]) # bcde 

    If start > end , no error is raised, and an empty string » is extracted.

    print(s[3:1]) # print(s[3:1] == '') # True 

    Out-of-range values are ignored.

    In addition to the start position start and end position stop , you can also specify an increment step using the syntax [start:stop:step] . If step is negative, the substring will be extracted in reverse order.

    print(s[1:4:2]) # bd print(s[::2]) # ace print(s[::3]) # ad print(s[::-1]) # edcba print(s[::-2]) # eca 

    For more information on slicing, see the following article.

    Extract based on the number of characters

    The built-in function len() returns the number of characters in a string. You can use this to get the central character or extract the first or second half of the string using slicing.

    Note that you can specify only integer int values for index [] and slice [:] . Division by / raises an error because the result is a floating-point number float .

    The following example uses integer division // which truncates the decimal part of the result.

    s = 'abcdefghi' print(len(s)) # 9 # print(s[len(s) / 2]) # TypeError: string indices must be integers print(s[len(s) // 2]) # e print(s[:len(s) // 2]) # abcd print(s[len(s) // 2:]) # efghi 

    Extract a substring with regular expressions: re.search() , re.findall()

    You can use regular expressions with the re module of the standard library.

    Use re.search() to extract a substring matching a regular expression pattern. Specify the regular expression pattern as the first parameter and the target string as the second parameter.

    import re s = '012-3456-7890' print(re.search(r'\d+', s)) # 

    In regular expressions, \d matches a digit character, while + matches one or more repetitions of the preceding pattern. Therefore, \d+ matches one or more consecutive digits.

    Since backslash \ is used in regular expression special sequences such as \d , it is convenient to use a raw string by adding r before » or «» .

    When a string matches the pattern, re.search() returns a match object. You can get the matched part as a string str by the group() method of the match object.

    m = re.search(r'\d+', s) print(m.group()) # 012 print(type(m.group())) # 

    For more information on regular expression match objects, see the following article.

    As shown in the example above, re.search() returns the match object for the first occurrence only, even if there are multiple matching parts in the string.

    re.findall() returns a list of all matching substrings.

    print(re.findall(r'\d+', s)) # ['012', '3456', '7890'] 

    Regex pattern examples

    This section provides examples of regular expression patterns using metacharacters and special sequences.

    Wildcard-like patterns

    . matches any single character except a newline, and * matches zero or more repetitions of the preceding pattern.

    For example, a.*b matches the string starting with a and ending with b . Since * matches zero repetitions, it also matches ab .

    print(re.findall('a.*b', 'axyzb')) # ['axyzb'] print(re.findall('a.*b', 'a---b')) # ['a---b'] print(re.findall('a.*b', 'aあいうえおb')) # ['aあいうえおb'] print(re.findall('a.*b', 'ab')) # ['ab'] 

    + matches one or more repetitions of the preceding pattern. a.+b does not match ab .

    print(re.findall('a.+b', 'ab')) # [] print(re.findall('a.+b', 'axb')) # ['axb'] print(re.findall('a.+b', 'axxxxxxb')) # ['axxxxxxb'] 

    ? matches zero or one preceding pattern. In the case of a.?b , it matches ab and the string with only one character between a and b .

    print(re.findall('a.?b', 'ab')) # ['ab'] print(re.findall('a.?b', 'axb')) # ['axb'] print(re.findall('a.?b', 'axxb')) # [] 

    Greedy and non-greedy

    * , + , and ? are greedy matches, matching as much text as possible. In contrast, *? , +? , and ?? are non-greedy, minimal matches, matching as few characters as possible.

    s = 'axb-axxxxxxb' print(re.findall('a.*b', s)) # ['axb-axxxxxxb'] print(re.findall('a.*?b', s)) # ['axb', 'axxxxxxb'] 

    Extract part of the pattern with parentheses

    If you enclose part of a regular expression pattern in parentheses () , you can extract a substring in that part.

    print(re.findall('a(.*)b', 'axyzb')) # ['xyz'] 

    If you want to match parentheses () as characters, escape them with backslash \ .

    print(re.findall(r'\(.+\)', 'abc(def)ghi')) # ['(def)'] print(re.findall(r'\((.+)\)', 'abc(def)ghi')) # ['def'] 

    Match any single character

    Using square brackets [] in a pattern matches any single character from the enclosed string.

    Using a hyphen — between consecutive Unicode code points, like [a-z] , creates a character range. For example, [a-z] matches any single lowercase alphabetical character.

    print(re.findall('[abc]x', 'ax-bx-cx')) # ['ax', 'bx', 'cx'] print(re.findall('[abc]+', 'abc-aaa-cba')) # ['abc', 'aaa', 'cba'] print(re.findall('[a-z]+', 'abc-xyz')) # ['abc', 'xyz'] 

    Match the start/end of the string

    ^ matches the start of the string, and $ matches the end of the string.

    s = 'abc-def-ghi' print(re.findall('[a-z]+', s)) # ['abc', 'def', 'ghi'] print(re.findall('^[a-z]+', s)) # ['abc'] print(re.findall('[a-z]+$', s)) # ['ghi'] 

    Extract by multiple patterns

    Use | to match a substring that conforms to any of the specified patterns. For example, to match substrings that follow either pattern A or pattern B , use A|B .

    s = 'axxxb-012' print(re.findall('a.*b', s)) # ['axxxb'] print(re.findall(r'\d+', s)) # ['012'] print(re.findall(r'a.*b|\d+', s)) # ['axxxb', '012'] 

    Case-insensitive

    The re module is case-sensitive by default. Set the flags argument to re.IGNORECASE to perform case-insensitive.

    s = 'abc-Abc-ABC' print(re.findall('[a-z]+', s)) # ['abc', 'bc'] print(re.findall('[A-Z]+', s)) # ['A', 'ABC'] print(re.findall('[a-z]+', s, flags=re.IGNORECASE)) # ['abc', 'Abc', 'ABC'] 

    Источник

    Extract Substring From a String in Python

    Extract Substring From a String in Python

    1. Extract Substring Using String Slicing in Python
    2. Extract Substring Using the slice() Constructor in Python
    3. Extract Substring Using Regular Expression in Python

    The string is a sequence of characters. We deal with strings all the time, no matter if we are doing software development or competitive programming. Sometimes, while writing programs, we have to access sub-parts of a string. These sub-parts are more commonly known as substrings. A substring is a subset of a string.

    In Python, we can easily do this task using string slicing or using regular expression or regex.

    Extract Substring Using String Slicing in Python

    There are a few ways to do string slicing in Python. Indexing is the most basic and the most commonly used method. Refer to the following code.

    myString = "Mississippi" print(myString[:]) # Line 1 print(myString[4 : ]) # Line 2 print(myString[ : 8]) # Line 3 print(myString[2 : 7]) # Line 4 print(myString[4 : -1]) # Line 5 print(myString[-6 : -1]) # Line 6 
    Mississippi issippi Mississi ssiss issipp ssipp 

    In the above code, we add [] brackets at the end of the variable storing the string. We use this notation for indexing. Inside these brackets, we add some integer values that represent indexes.

    This is the format for the brackets [start : stop : step] (seperated by colons ( : )).

    By default, the value of start is 0 or the first index, the value of stop is the last index, and the value of step is 1 . start represents the starting index of the substring, stop represents the ending index of the substring, and step represents the value to use for incrementing after each index.

    The substring returned is actually between start index and stop — 1 index because the indexing starts from 0 in Python. So, if we wish to retrieve Miss from Mississippi , we should use [0 : 4]

    • [:] -> Returns the whole string.
    • [4 : ] -> Returns a substring starting from index 4 till the last index.
    • [ : 8] -> Returns a substring starting from index 0 till index 7 .
    • [2 : 7] -> Returns a substring starting from index 2 till index 6 .
    • [4 : -1] -> Returns a substring starting from index 4 till second last index. -1 can be used to define the last index in Python.
    • [-6 : -1] -> Returns a substring starting from the sixth index from the end till the second last index.

    Extract Substring Using the slice() Constructor in Python

    Instead of mentioning the indexes inside the brackets, we can use the slice() constructor to create a slice object to slice a string or any other sequence such as a list or tuple.

    The slice(start, stop, step) constructor accepts three parameters, namely, start , stop , and step . They mean exactly the same as explained above.

    The working of slice is a bit different as compared to brackets notation. The slice object is put inside the string variable brackets like this myString[] .

    If a single integer value, say x , is provided to the slice() constructor and is further used for index slicing, a substring starting from index 0 till index x — 1 will be retrieved. Refer to the following code.

    myString = "Mississippi" slice1 = slice(3) slice2 = slice(4) slice3 = slice(0, 8) slice4 = slice(2, 7) slice5 = slice(4, -1) slice6 = slice(-6, -1) print(myString[slice1]) print(myString[slice2]) print(myString[slice3]) print(myString[slice4]) print(myString[slice5]) print(myString[slice6]) 
    Mis Miss Mississi ssiss issipp ssipp 

    The outputs received are self-explanatory. The indexes follow the same rules as defined for brackets notation.

    Extract Substring Using Regular Expression in Python

    For regular expression, we’ll use Python’s in-built package re .

    import re  string = "123AAAMississippiZZZ123"  try:  found = re.search('AAA(.+?)ZZZ', string).group(1)  print(found) except AttributeError:  pass 

    In the above code, the search() function searches for the first location of the pattern provided as an argument in the passed string. It returns a Match object. A Match object has many attributes which define the output such as the span of the substring or the starting and the ending indexes of the substring.

    print(dir(re.search(‘AAA(.+?)ZZZ’, string))) will output all the attributes of the Match object. Note that some attributes might be missing because when dir() is used, __dir__() method is called, and this method returns a list of all the attributes. And this method is editable or overridable.

    Vaibhav is an artificial intelligence and cloud computing stan. He likes to build end-to-end full-stack web and mobile applications. Besides computer science and technology, he loves playing cricket and badminton, going on bike rides, and doodling.

    Related Article — Python String

    Источник

    Читайте также:  Result count php woocommerce
Оцените статью