- Split strings in Python (delimiter, line break, regex, etc.)
- Split by delimiter: split()
- Specify the delimiter: sep
- Specify the maximum number of splits: maxsplit
- Split from right by delimiter: rsplit()
- Split by line break: splitlines()
- Split by regex: re.split()
- Split by multiple different delimiters
- Concatenate a list of strings
- Split based on the number of characters: slice
- Splitting with delimiter python
- # Split a string with multiple delimiters in Python
- # Split a string based on multiple delimiters using square brackets []
- # Handling leading or trailing delimiters
- # Split a string with multiple delimiters using str.replace()
- # Split a string based on multiple delimiters with a reusable function
- # Additional Resources
Split strings in Python (delimiter, line break, regex, etc.)
This article explains how to split strings by delimiters, line breaks, regular expressions, and the number of characters in Python.
Refer to the following articles for more information on concatenating and extracting strings.
Split by delimiter: split()
Use the split() method to split by delimiter.
If the argument is omitted, it splits by whitespace (spaces, newlines \n , tabs \t , etc.) and processes consecutive whitespace together.
A list of the words is returned.
s_blank = 'one two three\nfour\tfive' print(s_blank) # one two three # four five print(s_blank.split()) # ['one', 'two', 'three', 'four', 'five'] print(type(s_blank.split())) #
Use join() , described below, to concatenate a list into a string.
Specify the delimiter: sep
Specify a delimiter for the first parameter, sep .
s_comma = 'one,two,three,four,five' print(s_comma.split(',')) # ['one', 'two', 'three', 'four', 'five'] print(s_comma.split('three')) # ['one,two,', ',four,five']
To specify multiple delimiters, use regular expressions as described later.
Specify the maximum number of splits: maxsplit
Specify the maximum number of splits for the second parameter, maxsplit .
If maxsplit is given, at most maxsplit splits are done (thus, the returned list will have at most maxsplit + 1 elements).
s_comma = 'one,two,three,four,five' print(s_comma.split(',', 2)) # ['one', 'two', 'three,four,five']
For example, maxsplit is helpful for removing the first line from a string.
If you specify sep=’\n’ and maxsplit=1 , you can get a list of strings split by the first newline character \n . The second element [1] of this list is a string excluding the first line. Since it is the last element, it can also be specified as [-1] .
s_lines = 'one\ntwo\nthree\nfour' print(s_lines) # one # two # three # four print(s_lines.split('\n', 1)) # ['one', 'two\nthree\nfour'] print(s_lines.split('\n', 1)[0]) # one print(s_lines.split('\n', 1)[1]) # two # three # four print(s_lines.split('\n', 1)[-1]) # two # three # four
Similarly, to delete the first two lines:
print(s_lines.split('\n', 2)[-1]) # three # four
Split from right by delimiter: rsplit()
rsplit() splits from the right of the string.
The result differs from split() only when the maxsplit parameter is provided.
Similar to split() , if you want to remove the last line, use rsplit() .
s_lines = 'one\ntwo\nthree\nfour' print(s_lines.rsplit('\n', 1)) # ['one\ntwo\nthree', 'four'] print(s_lines.rsplit('\n', 1)[0]) # one # two # three print(s_lines.rsplit('\n', 1)[1]) # four
To delete the last two lines:
print(s_lines.rsplit('\n', 2)[0]) # one # two
Split by line break: splitlines()
There is also a splitlines() for splitting by line boundaries.
As shown in the previous examples, split() and rsplit() split the string by whitespace, including line breaks, by default. You can also specify line breaks explicitly using the sep parameter.
However, using splitlines() is often more suitable.
For example, split string that contains \n (LF, used in Unix OS including Mac) and \r\n (CR + LF, used in Windows OS).
s_lines_multi = '1 one\n2 two\r\n3 three\n' print(s_lines_multi) # 1 one # 2 two # 3 three
By default, when split() is applied, it splits not only by line breaks but also by spaces.
print(s_lines_multi.split()) # ['1', 'one', '2', 'two', '3', 'three']
As sep allows specifying only one newline character, split() may not work as expected if the string contains mixed newline characters. It is also split at the end of the newline character.
print(s_lines_multi.split('\n')) # ['1 one', '2 two\r', '3 three', '']
splitlines() splits at various newline characters but not at other whitespaces.
print(s_lines_multi.splitlines()) # ['1 one', '2 two', '3 three']
If the first argument, keepends , is set to True , the result includes a newline character at the end of the line.
print(s_lines_multi.splitlines(True)) # ['1 one\n', '2 two\r\n', '3 three\n']
See the following article for other operations with line breaks.
Split by regex: re.split()
split() and rsplit() split only when sep matches completely.
If you want to split a string that matches a regular expression (regex) instead of perfect match, use the split() of the re module.
In re.split() , specify the regex pattern in the first parameter and the target character string in the second parameter.
Here’s an example of splitting a string by consecutive numbers:
import re s_nums = 'one1two22three333four' print(re.split('\d+', s_nums)) # ['one', 'two', 'three', 'four']
The maximum number of splits can be specified in the third parameter, maxsplit .
print(re.split('\d+', s_nums, 2)) # ['one', 'two', 'three333four']
Split by multiple different delimiters
These two examples are helpful to remember, even if you are not familiar with regex:
Enclose a string with [] to match any single character in it. You can split a string by multiple different characters.
s_marks = 'one-two+three#four' print(re.split('[-+#]', s_marks)) # ['one', 'two', 'three', 'four']
If patterns are delimited by | , it matches any pattern. Of course, it is possible to use special characters of regex for each pattern, but it is OK even if normal string is specified as it is. You can split by multiple different strings.
s_strs = 'oneXXXtwoYYYthreeZZZfour' print(re.split('XXX|YYY|ZZZ', s_strs)) # ['one', 'two', 'three', 'four']
Concatenate a list of strings
In the previous examples, you can split the string and get the list.
If you want to concatenate a list of strings into one string, use the string method, join() .
Call join() from ‘separator’ , and pass a list of strings to be concatenated.
l = ['one', 'two', 'three'] print(','.join(l)) # one,two,three print('\n'.join(l)) # one # two # three print(''.join(l)) # onetwothree
See the following article for details of string concatenation.
Split based on the number of characters: slice
Use slice to split strings based on the number of characters.
s = 'abcdefghij' print(s[:5]) # abcde print(s[5:]) # fghij
The split results can be obtained as a tuple or assigned to individual variables.
s_tuple = s[:5], s[5:] print(s_tuple) # ('abcde', 'fghij') print(type(s_tuple)) # s_first, s_last = s[:5], s[5:] print(s_first) # abcde print(s_last) # fghij
s_first, s_second, s_last = s[:3], s[3:6], s[6:] print(s_first) # abc print(s_second) # def print(s_last) # ghij
The number of characters can be obtained with the built-in function len() . You can also split a string into halves using this.
half = len(s) // 2 print(half) # 5 s_first, s_last = s[:half], s[half:] print(s_first) # abcde print(s_last) # fghij
If you want to concatenate strings, use the + operator.
print(s_first + s_last) # abcdefghij
Splitting with delimiter python
Last updated: Feb 24, 2023
Reading time · 4 min
# Split a string with multiple delimiters in Python
To split a string with multiple delimiters:
- Use the re.split() method, e.g. re.split(r’,|-‘, my_str) .
- The re.split() method will split the string on all occurrences of one of the delimiters.
Copied!import re # 👇️ split string with 2 delimiters my_str = 'bobby,hadz-dot,com' my_list = re.split(r',|-', my_str) # 👈️ split on comma or hyphen print(my_list) # 👉️ ['bobby', 'hadz', 'dot', 'com']
The re.split method takes a pattern and a string and splits the string on each occurrence of the pattern.
The pipe | character is an OR . Either match A or B .
The example splits a string using 2 delimiters — a comma and a hyphen.
Copied!# 👇️ split string with 3 delimiters my_str = 'bobby,hadz-dot:com' my_list = re.split(r',|-|:', my_str) # 👈️ comma, hyphen or colon print(my_list) # 👉️ ['bobby', 'hadz', 'dot', 'com']
Here is an example that splits the string using 3 delimiters — a comma, a hyphen and a colon.
You can use as many | characters as necessary in your regular expression.
# Split a string based on multiple delimiters using square brackets []
Alternatively, you can use square brackets [] to indicate a set of characters.
Copied!import re my_str = 'bobby,hadz-dot,com' my_list = re.split(r'[,-]', my_str) print(my_list) # 👉️ ['bobby', 'hadz', 'dot', 'com']
Make sure to add all of the delimiters between the square brackets.
Copied!import re # 👇️ split string with 3 delimiters my_str = 'bobby,hadz-dot:com' my_list = re.split(r'[,-:]', my_str) print(my_list) # 👉️ ['bobby', 'hadz', 'dot', 'com']
You might get empty string values in the output list if the string starts with or ends with one of the delimiters.
# Handling leading or trailing delimiters
You can use a list comprehension to remove any empty strings from the list.
Copied!import re # 👇️ split string with 3 delimiters my_str = ',bobby,hadz-dot:com:' my_list = [ item for item in re.split(r'[,-:]', my_str) if item ] print(my_list) # 👉️ ['bobby', 'hadz', 'dot', 'com']
The list comprehension takes care of removing the empty strings from the list.
List comprehensions are used to perform some operation for every element or select a subset of elements that meet a condition.
An alternative approach is to use the str.replace() method.
# Split a string with multiple delimiters using str.replace()
This is a two-step process:
- Use the str.replace() method to replace the first delimiter with the second.
- Use the str.split() method to split the string by the second delimiter.
Copied!my_str = 'bobby_hadz!dot_com' my_list = my_str.replace('_', '!').split('!') print(my_list) # 👉️ ['bobby', 'hadz', 'dot', 'com']
First, we replace every occurrence of the first delimiter with the second, and then we split on the second delimiter.
The str.replace method returns a copy of the string with all occurrences of a substring replaced by the provided replacement.
The method takes the following parameters:
Name | Description |
---|---|
old | The substring we want to replace in the string |
new | The replacement for each occurrence of old |
count | Only the first count occurrences are replaced (optional) |
Note that the method doesn’t change the original string. Strings are immutable in Python.
Copied!my_str = 'bobby hadz, dot # com. abc' my_list = my_str.replace( ',', '').replace( '#', '').replace('.', '').split() print(my_list) # 👉️ ['bobby', 'hadz', 'dot', 'com', 'abc']
We used the str.replace() method to remove the punctuation before splitting the string on whitespace characters.
You can chain as many calls to the str.replace() method as necessary.
The last step is to use the str.split() method to split the string into a list of words.
The str.split() method splits the string into a list of substrings using a delimiter.
The method takes the following 2 parameters:
Name | Description |
---|---|
separator | Split the string into substrings on each occurrence of the separator |
maxsplit | At most maxsplit splits are done (optional) |
When no separator is passed to the str.split() method, it splits the input string on one or more whitespace characters.
Copied!my_str = 'bobby hadz com' print(my_str.split()) # 👉️ ['bobby', 'hadz', 'com']
If the separator is not found in the string, a list containing only 1 element is returned.
# Split a string based on multiple delimiters with a reusable function
If you need to split a string based on multiple delimiters often, define a reusable function.
Copied!import re def split_multiple(string, delimiters): pattern = '|'.join(map(re.escape, delimiters)) return re.split(pattern, string) my_str = 'bobby,hadz-dot:com' print(split_multiple(my_str, [',', '-', ':']))
The split_multiple function takes a string and a list of delimiters and splits the string on the delimiters.
The str.join() method is used to join the delimiters with a pipe | separator.
This creates a regex pattern that we can use to split the string based on the specified delimiters.
If you need to split a string into a list of words with multiple delimiters, you can also use the re.findall() method.
# Additional Resources
You can learn more about the related topics by checking out the following tutorials:
I wrote a book in which I share everything I know about how to become a better, more efficient programmer.