Splitting with delimiter python

Split strings in Python (delimiter, line break, regex, etc.)

This article explains how to split strings by delimiters, line breaks, regular expressions, and the number of characters in Python.

Refer to the following articles for more information on concatenating and extracting strings.

Split by delimiter: split()

Use the split() method to split by delimiter.

If the argument is omitted, it splits by whitespace (spaces, newlines \n , tabs \t , etc.) and processes consecutive whitespace together.

A list of the words is returned.

s_blank = 'one two three\nfour\tfive' print(s_blank) # one two three # four five print(s_blank.split()) # ['one', 'two', 'three', 'four', 'five'] print(type(s_blank.split())) # 

Use join() , described below, to concatenate a list into a string.

Specify the delimiter: sep

Specify a delimiter for the first parameter, sep .

s_comma = 'one,two,three,four,five' print(s_comma.split(',')) # ['one', 'two', 'three', 'four', 'five'] print(s_comma.split('three')) # ['one,two,', ',four,five'] 

To specify multiple delimiters, use regular expressions as described later.

Specify the maximum number of splits: maxsplit

Specify the maximum number of splits for the second parameter, maxsplit .

If maxsplit is given, at most maxsplit splits are done (thus, the returned list will have at most maxsplit + 1 elements).

s_comma = 'one,two,three,four,five' print(s_comma.split(',', 2)) # ['one', 'two', 'three,four,five'] 

For example, maxsplit is helpful for removing the first line from a string.

If you specify sep=’\n’ and maxsplit=1 , you can get a list of strings split by the first newline character \n . The second element [1] of this list is a string excluding the first line. Since it is the last element, it can also be specified as [-1] .

s_lines = 'one\ntwo\nthree\nfour' print(s_lines) # one # two # three # four print(s_lines.split('\n', 1)) # ['one', 'two\nthree\nfour'] print(s_lines.split('\n', 1)[0]) # one print(s_lines.split('\n', 1)[1]) # two # three # four print(s_lines.split('\n', 1)[-1]) # two # three # four 

Similarly, to delete the first two lines:

print(s_lines.split('\n', 2)[-1]) # three # four 

Split from right by delimiter: rsplit()

rsplit() splits from the right of the string.

The result differs from split() only when the maxsplit parameter is provided.

Similar to split() , if you want to remove the last line, use rsplit() .

s_lines = 'one\ntwo\nthree\nfour' print(s_lines.rsplit('\n', 1)) # ['one\ntwo\nthree', 'four'] print(s_lines.rsplit('\n', 1)[0]) # one # two # three print(s_lines.rsplit('\n', 1)[1]) # four 

To delete the last two lines:

print(s_lines.rsplit('\n', 2)[0]) # one # two 

Split by line break: splitlines()

There is also a splitlines() for splitting by line boundaries.

As shown in the previous examples, split() and rsplit() split the string by whitespace, including line breaks, by default. You can also specify line breaks explicitly using the sep parameter.

However, using splitlines() is often more suitable.

For example, split string that contains \n (LF, used in Unix OS including Mac) and \r\n (CR + LF, used in Windows OS).

s_lines_multi = '1 one\n2 two\r\n3 three\n' print(s_lines_multi) # 1 one # 2 two # 3 three 

By default, when split() is applied, it splits not only by line breaks but also by spaces.

print(s_lines_multi.split()) # ['1', 'one', '2', 'two', '3', 'three'] 

As sep allows specifying only one newline character, split() may not work as expected if the string contains mixed newline characters. It is also split at the end of the newline character.

print(s_lines_multi.split('\n')) # ['1 one', '2 two\r', '3 three', ''] 

splitlines() splits at various newline characters but not at other whitespaces.

print(s_lines_multi.splitlines()) # ['1 one', '2 two', '3 three'] 

If the first argument, keepends , is set to True , the result includes a newline character at the end of the line.

print(s_lines_multi.splitlines(True)) # ['1 one\n', '2 two\r\n', '3 three\n'] 

See the following article for other operations with line breaks.

Split by regex: re.split()

split() and rsplit() split only when sep matches completely.

If you want to split a string that matches a regular expression (regex) instead of perfect match, use the split() of the re module.

In re.split() , specify the regex pattern in the first parameter and the target character string in the second parameter.

Here’s an example of splitting a string by consecutive numbers:

import re s_nums = 'one1two22three333four' print(re.split('\d+', s_nums)) # ['one', 'two', 'three', 'four'] 

The maximum number of splits can be specified in the third parameter, maxsplit .

print(re.split('\d+', s_nums, 2)) # ['one', 'two', 'three333four'] 

Split by multiple different delimiters

These two examples are helpful to remember, even if you are not familiar with regex:

Enclose a string with [] to match any single character in it. You can split a string by multiple different characters.

s_marks = 'one-two+three#four' print(re.split('[-+#]', s_marks)) # ['one', 'two', 'three', 'four'] 

If patterns are delimited by | , it matches any pattern. Of course, it is possible to use special characters of regex for each pattern, but it is OK even if normal string is specified as it is. You can split by multiple different strings.

s_strs = 'oneXXXtwoYYYthreeZZZfour' print(re.split('XXX|YYY|ZZZ', s_strs)) # ['one', 'two', 'three', 'four'] 

Concatenate a list of strings

In the previous examples, you can split the string and get the list.

If you want to concatenate a list of strings into one string, use the string method, join() .

Call join() from ‘separator’ , and pass a list of strings to be concatenated.

l = ['one', 'two', 'three'] print(','.join(l)) # one,two,three print('\n'.join(l)) # one # two # three print(''.join(l)) # onetwothree 

See the following article for details of string concatenation.

Split based on the number of characters: slice

Use slice to split strings based on the number of characters.

s = 'abcdefghij' print(s[:5]) # abcde print(s[5:]) # fghij 

The split results can be obtained as a tuple or assigned to individual variables.

s_tuple = s[:5], s[5:] print(s_tuple) # ('abcde', 'fghij') print(type(s_tuple)) # s_first, s_last = s[:5], s[5:] print(s_first) # abcde print(s_last) # fghij 
s_first, s_second, s_last = s[:3], s[3:6], s[6:] print(s_first) # abc print(s_second) # def print(s_last) # ghij 

The number of characters can be obtained with the built-in function len() . You can also split a string into halves using this.

half = len(s) // 2 print(half) # 5 s_first, s_last = s[:half], s[half:] print(s_first) # abcde print(s_last) # fghij 

If you want to concatenate strings, use the + operator.

print(s_first + s_last) # abcdefghij 

Источник

Splitting with delimiter python

Last updated: Feb 24, 2023
Reading time · 4 min

banner

# Split a string with multiple delimiters in Python

To split a string with multiple delimiters:

  1. Use the re.split() method, e.g. re.split(r’,|-‘, my_str) .
  2. The re.split() method will split the string on all occurrences of one of the delimiters.
Copied!
import re # 👇️ split string with 2 delimiters my_str = 'bobby,hadz-dot,com' my_list = re.split(r',|-', my_str) # 👈️ split on comma or hyphen print(my_list) # 👉️ ['bobby', 'hadz', 'dot', 'com']

split string with multiple delimiters

The re.split method takes a pattern and a string and splits the string on each occurrence of the pattern.

The pipe | character is an OR . Either match A or B .

The example splits a string using 2 delimiters — a comma and a hyphen.

Copied!
# 👇️ split string with 3 delimiters my_str = 'bobby,hadz-dot:com' my_list = re.split(r',|-|:', my_str) # 👈️ comma, hyphen or colon print(my_list) # 👉️ ['bobby', 'hadz', 'dot', 'com']

Here is an example that splits the string using 3 delimiters — a comma, a hyphen and a colon.

You can use as many | characters as necessary in your regular expression.

# Split a string based on multiple delimiters using square brackets []

Alternatively, you can use square brackets [] to indicate a set of characters.

Copied!
import re my_str = 'bobby,hadz-dot,com' my_list = re.split(r'[,-]', my_str) print(my_list) # 👉️ ['bobby', 'hadz', 'dot', 'com']

split multiple delimiters

Make sure to add all of the delimiters between the square brackets.

Copied!
import re # 👇️ split string with 3 delimiters my_str = 'bobby,hadz-dot:com' my_list = re.split(r'[,-:]', my_str) print(my_list) # 👉️ ['bobby', 'hadz', 'dot', 'com']

place all delimiters between brackets

You might get empty string values in the output list if the string starts with or ends with one of the delimiters.

# Handling leading or trailing delimiters

You can use a list comprehension to remove any empty strings from the list.

Copied!
import re # 👇️ split string with 3 delimiters my_str = ',bobby,hadz-dot:com:' my_list = [ item for item in re.split(r'[,-:]', my_str) if item ] print(my_list) # 👉️ ['bobby', 'hadz', 'dot', 'com']

exclude empty strings from result

The list comprehension takes care of removing the empty strings from the list.

List comprehensions are used to perform some operation for every element or select a subset of elements that meet a condition.

An alternative approach is to use the str.replace() method.

# Split a string with multiple delimiters using str.replace()

This is a two-step process:

  1. Use the str.replace() method to replace the first delimiter with the second.
  2. Use the str.split() method to split the string by the second delimiter.
Copied!
my_str = 'bobby_hadz!dot_com' my_list = my_str.replace('_', '!').split('!') print(my_list) # 👉️ ['bobby', 'hadz', 'dot', 'com']

split multiple delimiters using replace

First, we replace every occurrence of the first delimiter with the second, and then we split on the second delimiter.

The str.replace method returns a copy of the string with all occurrences of a substring replaced by the provided replacement.

The method takes the following parameters:

Name Description
old The substring we want to replace in the string
new The replacement for each occurrence of old
count Only the first count occurrences are replaced (optional)

Note that the method doesn’t change the original string. Strings are immutable in Python.

Copied!
my_str = 'bobby hadz, dot # com. abc' my_list = my_str.replace( ',', '').replace( '#', '').replace('.', '').split() print(my_list) # 👉️ ['bobby', 'hadz', 'dot', 'com', 'abc']

split three delimiters using replace

We used the str.replace() method to remove the punctuation before splitting the string on whitespace characters.

You can chain as many calls to the str.replace() method as necessary.

The last step is to use the str.split() method to split the string into a list of words.

The str.split() method splits the string into a list of substrings using a delimiter.

The method takes the following 2 parameters:

Name Description
separator Split the string into substrings on each occurrence of the separator
maxsplit At most maxsplit splits are done (optional)

When no separator is passed to the str.split() method, it splits the input string on one or more whitespace characters.

Copied!
my_str = 'bobby hadz com' print(my_str.split()) # 👉️ ['bobby', 'hadz', 'com']

If the separator is not found in the string, a list containing only 1 element is returned.

# Split a string based on multiple delimiters with a reusable function

If you need to split a string based on multiple delimiters often, define a reusable function.

Copied!
import re def split_multiple(string, delimiters): pattern = '|'.join(map(re.escape, delimiters)) return re.split(pattern, string) my_str = 'bobby,hadz-dot:com' print(split_multiple(my_str, [',', '-', ':']))

split based on multiple delimiters with function

The split_multiple function takes a string and a list of delimiters and splits the string on the delimiters.

The str.join() method is used to join the delimiters with a pipe | separator.

This creates a regex pattern that we can use to split the string based on the specified delimiters.

If you need to split a string into a list of words with multiple delimiters, you can also use the re.findall() method.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.

Источник

Читайте также:  Python создать list заданного размера
Оцените статью