Functions with string in python

String Functions in Python with Examples

This tutorial outlines various string (character) functions used in Python. To manipulate strings and character values, python has several in-built functions. It means you don’t need to import or have dependency on any external package to deal with string data type in Python. It’s one of the advantage of using Python over other data science tools. Dealing with string values is very common in real-world. Suppose you have customers’ full name and you were asked by your manager to extract first and last name of customer. Or you want to fetch information of all the products that have code starting with ‘QT’.

Python String Functions

List of frequently used string functions

The table below shows many common string functions along with description and its equivalent function in MS Excel. We all use MS Excel in our workplace and familiar with the functions used in MS Excel. The comparison of string functions in MS EXCEL and Python would help you to learn the functions quickly and mug-up before interview.

Function Description MS EXCEL FUNCTION
mystring[:N] Extract N number of characters from start of string. LEFT( )
mystring[-N:] Extract N number of characters from end of string RIGHT( )
mystring[X:Y] Extract characters from middle of string, starting from X position and ends with Y MID( )
str.split(sep=’ ‘) Split Strings
str.replace(old_substring, new_substring) Replace a part of text with different sub-string REPLACE( )
str.lower() Convert characters to lowercase LOWER( )
str.upper() Convert characters to uppercase UPPER( )
str.contains(‘pattern’, case=False) Check if pattern matches (Pandas Function) SQL LIKE Operator
str.extract(regular_expression) Return matched values (Pandas Function)
str.count(‘sub_string’) Count occurence of pattern in string
str.find( ) Return position of sub-string or pattern FIND( )
str.isalnum() Check whether string consists of only alphanumeric characters
str.islower() Check whether characters are all lower case
str.isupper() Check whether characters are all upper case
str.isnumeric() Check whether string consists of only numeric characters
str.isspace() Check whether string consists of only whitespace characters
len( ) Calculate length of string LEN( )
cat( ) Concatenate Strings (Pandas Function) CONCATENATE( )
separator.join(str) Concatenate Strings CONCATENATE( )
Читайте также:  Connect java and python

LEFT, RIGHT and MID Functions

If you are intermediate MS Excel users, you must have used LEFT, RIGHT and MID Functions. These functions are used to extract N number of characters or letters from string.

mystring = "Hey buddy, wassup?" mystring[:2]
  1. string[start:stop:step] means item start from 0 (default) through (stop-1), step by 1 (default).
  2. mystring[:2] is equivalent to mystring[0:2]
  3. mystring[:2] tells Python to pull first 2 characters from mystring string object.
  4. Indexing starts from zero so it includes first, second element and excluding third.

The above command returns p? .The -2 starts the range from second last position through maximum length of string.

mystring[1:3] returns second and third characters. 1 refers to second character as index begins with 0.

Let’s create a fake data frame for illustration. In the code below, we are creating a dataframe named df containing only 1 variable called var1

import pandas as pd df = pd.DataFrame() var1 0 A_2 1 B_1 2 C_2 3 A_2

To deal text data in Python Pandas Dataframe, we can use str attribute. It can be used for slicing character values.

Output 0 A 1 B 2 C 3 A

Extract Words from String

Suppose you need to take out word(s) instead of characters from string. Generally we consider one blank space as delimiter to find words from string.

  1. split() function breaks string using space as a default separator
  2. mystring.split() returns [‘Hey’, ‘buddy,’, ‘wassup?’]
  3. 0 returns first item or word Hey
custname 0 Priya_Sehgal 1 David_Stevart 2 Kasia_Woja 3 Sandy_Dave
#First Word mydf['fname'] = mydf['custname'].str.split('_').str[0] #Last Word mydf['lname'] = mydf['custname'].str.split('_').str[1]
  1. str.split( ) is similar to split( ) . It is used to activate split function in pandas data frame in Python.
  2. In the code above, we created two new columns named fname and lname storing first and last name.
Output custname fname lname 0 Priya_Sehgal Priya Sehgal 1 David_Stevart David Stevart 2 Kasia_Woja Kasia Woja 3 Sandy_Dave Sandy Dave

SQL LIKE Operator in Pandas DataFrame

In SQL, LIKE Statement is used to find out if a character string matches or contains a pattern. We can implement similar functionality in python using str.contains( ) function.

var1 var2 0 AA_2 X_2 1 B_1 Y_1 2 C_2 Z_2 3 a_2 X2
Output 0 True 1 True 2 False 3 False

The above command returns FALSE against fourth row as the function is case-sensitive. To ignore case-sensitivity, we can use case=False parameter. See the working example below.

df2['var1'].str.contains('A|B', case=False)

In the following program, we are asking Python to subset data with condition — contain character values either A or B. It is equivalent to WHERE keyword in SQL.

df2[df2['var1'].str.contains('A|B', case=False)]
Output var1 var2 0 AA_2 X_2 1 B_1 Y_1 3 a_2 X2
df2[df2['var1'].str.contains('^[A-Z]_', case=False)]
var1 var2 1 B_1 Y_1 2 C_2 Z_2 3 a_2 X2

Find position of a particular character or keyword

Replace substring

str.replace(old_text,new_text,case=False) is used to replace a particular character(s) or pattern with some new value or pattern. In the code below, we are replacing _ with — in variable var1.

df2['var1'].str.replace('_', '--', case=False)
Output 0 AA--2 1 B--1 2 C--2 3 A--2

We can also complex patterns like the following program. + means item occurs one or more times. In this case, alphabet occurring 1 or more times.

df2['var1'].str.replace('[A-Z]+_', 'X', case=False)

Find length of string

len(string) is used to calculate length of string. In pandas data frame, you can apply str.len() for the same.

Output 0 4 1 3 2 3 3 3

To find count of occurrence of a particular character (let’s say, how many time ‘A’ appears in each row), you can use str.count(pattern) function.

Convert to lowercase and uppercase

#Convert to lower case mydf['custname'].str.lower() #Convert to upper case mydf['custname'].str.upper()

Remove Leading and Trailing Spaces

  1. str.strip() removes both leading and trailing spaces.
  2. str.lstrip() removes leading spaces (at beginning).
  3. str.rstrip() removes trailing spaces (at end).
df1 = pd.DataFrame() df1['both']=df1['y1'].str.strip() df1['left']=df1['y1'].str.lstrip() df1['right']=df1['y1'].str.rstrip()
y1 both left right 0 jack jack jack jack 1 jill jill jill jill 2 jesse jesse jesse jesse 3 frank frank frank frank

Convert Numeric to String

myvariable = 4 mystr = str(myvariable)

Concatenate or Join Strings

In case you want to add a space between two strings, you can use this — x+’ ‘+y returns Deepanshu Bhalla Suppose you have a list containing multiple string values and you want to combine them. You can use join( ) function.

string0 = ['Ram', 'Kumar', 'Singh'] ' '.join(string0)
Output 'Ram Kumar Singh'
custname fname lname fullname 0 Priya_Sehgal Priya Sehgal Priya Sehgal 1 David_Stevart David Stevart David Stevart 2 Kasia_Woja Kasia Woja Kasia Woja 3 Sandy_Dave Sandy Dave Sandy Dave

SQL IN Operator in Pandas

mydata = pd.DataFrame() mydata[mydata['product'].isin(['A', 'B'])]
mydata[~mydata['product'].isin(['A', 'B'])]

Extract a particular pattern from string

df2['var1'].str.extract(r'(^[A-Z]_)').dropna()

Источник

Оцените статью