- Python string find, like and contains examples
- Like operator in Python — in
- Python string — find()
- Check list of strings — exact match
- Python find string in list
- Compare two lists of strings
- Python string contains or like operator
- Testing string against list of string (substring)
- Python like function
- Python like/contains operator
- Saved searches
- Use saved searches to filter your results more quickly
- License
- Ipgnosis/like
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
- How to Use Like Operator in Pandas DataFrame
- Example 1: Pandas find rows which contain string
- Example 2: Pandas simulate Like operator and regex
- Example 3: Pandas match rows starting with text
- Example 4: Pandas match rows ending with text
- Example 5: Pandas Like operator with Query
- Step 6: Pandas Like operator match numbers only
- Step 7: Pandas SQL Like operator
- Resources
Python string find, like and contains examples
Python offers several different ways to check if a string contains a substring. In this article I’ll post several different ways:
- test_string in other_string — return True/False
- test_word.startswith(word) — return True/False
- word.find(test_word) — return index
Like operator in Python — in
Python contains or like operator in Python can be done by using operator — in :
test_string in other_string
This will return True or False depending on the result of the execution.
As we can see from the examples below it’s case sensitive. There is a way to make is case insensitive by using: mystr.lower() :
print('lemon' in 'lemon pie') # True print('lemon' in 'Lemon') # False print('LeMoN'.lower() in 'lemon') # True print('lemon' in 'lemon') # True print('lemon' in 'Hot lemon pie') # True print('lemon' in 'orange juice') # False
Python string — find()
Like operator in Python or string contains — we can check if a string contains a substring by using method .find() :
sentence = "I want a cup of lemon juice"; test_word = "lemon"; test_word_Up = "LEMON"; print (sentence.find(test_word)) print (sentence.find(test_word_Up)) print (sentence.find(test_word, 5)) print (sentence.find(test_word, 20))
Check list of strings — exact match
If you have a list of strings or sentences you can check them by:
forbidden_list = ['apple juice', 'banana pie', 'orange juice', 'lemon pie', 'lemon'] test_word = 'lemon' if test_word in forbidden_list: print(test_word)
Check if any of the words in this list contains word lemon :
Testing with the word ‘apple’ the result is empty.
This case is useful when you have a predefined list of values and you want to verify the tested example is not part of the predefined list with values.
Python find string in list
In this section we will test whether a string is substring of any of the list items. To do so we are going to loop over each string and test it by in , find() or .startswith() :
forbidden_list = ['apple juice', 'banana pie', 'orange juice', 'lemon pie', 'lemon'] test_word = 'lemon' for word in forbidden_list: if word.startswith(test_word): print(word)
This will check if the word is a substring of any of the strings in the list.
Compare two lists of strings
Often there is a need to filter many strings against a list of forbidden values. This can be done by:
- iterating the list of search words
- list all forbidden words
- check if the given word is part of a forbidden word:
forbidden_list = ['apple', 'banana', 'orange', 'lemon', 'kiwi', 'mango'] search_words = ['apple', 'orange', 'lemon'] for test_word in search_words: if any(word.startswith(test_word) for word in forbidden_list): print(test_word)
using this way you have freedom of choosing what test to be applied — exact match, starting as a string or to be substring:
For exact match you can try also to use:
diff_list = list(set(forbidden_list) & set(search_words))
Python string contains or like operator
Below you can find two methods which simulates string contains or like behavior using python:
Testing string against list of string (substring)
If you want to check a given word(s) are they part of a list of a strings this can be done by:
forbidden_list = ['apple', 'banana', 'orange', 'lemon', 'kiwi', 'mango'] forbidden_like_list = ['apple juice', 'banana pie', 'orange juice', 'lemon pie'] search_words = ['apple', 'banana', 'orange', 'lemon'] for test_word in search_words: if any(word.startswith(test_word) for word in forbidden_like_list): print(test_word) print('--------------------------------') test_word = 'lemon' if any(test_word == word for word in forbidden_list): print(word)
Python like function
This method implements a check of a given list if it is part of another list. This can be used as a filter for messages.
forbidden_list = ['apple', 'banana', 'orange', 'lemon', 'kiwi', 'mango'] search_words = ['apple', 'banana', 'orange', 'lemon'] def string_like(search_words, forbidden_list): for line in forbidden_list: if any(word in line for word in search_words): print(line) string_like(search_words, forbidden_list)
apple banana orange lemon
Python like/contains operator
Implementation of the this method is similar to the previous one except that check is verifying that string contains another string:
forbidden_like_list = ['apple juice', 'banana pie', 'orange juice', 'lemon pie'] search_words = ['apple', 'banana', 'orange', 'lemon'] def string_in(search_words, forbidden_like_list): for line in forbidden_like_list: if any(word in line for word in search_words): print(line) string_in(search_words, forbidden_like_list)
apple juice banana pie orange juice lemon pie
By using SoftHints — Python, Linux, Pandas , you agree to our Cookie Policy.
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Working toward a package that will implement a ‘like’ function that compares two strings to determine similarity. See more in the README.
License
Ipgnosis/like
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
This is a work in progress.
A string compare function that analyzes to what extent string A is similar to string B.
This idea came from working on a different project where I kept misspelling the country name ‘Kazakhstan’ as ‘Khazakstan’. I searched for a ‘like’ function in Python and found none. This is strange, because this has been implemented in other languages before: there are even ‘sounds like’ functions (i.e. ‘soundex’). So, just for fun, I thought I would give it a shot. I expect that this will be a lot easier than implementing a spelling checker (or learning how to spell. )
An extension of the Python string object that adds a ‘like’ operator to test equivalence. This will enable ‘Khazakstan’ to match to ‘Kazakhstan’.
Note that this is not a spelling checker.
The initial implementation will evaluate some kind of probability function. As a comparison, I will test against the work of Damerau-Levenshtein.
This will be useful for handling ‘typos’ resulting from keyboarding errors (aka transpositions, e.g. ‘typo’ vs. ‘tyop’) that have found their way into data, thereby making it difficult to search upon.
This is particularly useful when a string contains international characters that aren’t available on all keyboards, for example:
- ñ: the Spanish letter ‘eñe’
- ü: the German (etc.) letter u with an umlaut
- ç: the French c-cedilla
- ß: the German letter ‘eszett’
(I suspect that the Cyrillic character set is overly ambitious. )
Note that the eszett (and others also) is a complication of the general problem in that two letters (i.e. ‘ss’) can be substituted for the eszett when the keyboard/character set in use doesn’t contain the eszett. For example: the German word for ‘street’ is ‘straße’, which can also be written ‘strasse’.
def like(strA, strB, args1 = float)
- If args[0] then return True (if similarity >= args[0]) or False
- If not args[0] then return float 0:1 — 0 = completely dissimilar; 1 = exact match.
The test data (data/test_data.json) is extracted from a Wikipedia article on the most misspelled words: Commonly misspelled English words.
About
Working toward a package that will implement a ‘like’ function that compares two strings to determine similarity. See more in the README.
How to Use Like Operator in Pandas DataFrame
If so, let’s check several examples of Pandas text matching simulating Like operator.
To start, here is a sample DataFrame which will be used in the next examples:
data = df = pd.DataFrame(data, index=['dog', 'hawk', 'shark', 'cat', 'crow', 'human'])
num_legs | num_wings | class | |
---|---|---|---|
dog | 4 | 0 | mammal |
hawk | 2 | 2 | bird |
shark | 0 | 0 | fish |
cat | 4 | 0 | mammal |
crow | 2 | 2 | |
human | 2 | 2 | mammal |
Example 1: Pandas find rows which contain string
The first example is about filtering rows in DataFrame which is based on cell content — if the cell contains a given pattern extract it otherwise skip the row. Let’s get all rows for which column class contains letter i :
df['class'].str.contains('i', na=False)
this will result in Series of True and False:
dog False
hawk True
shark True
cat False
crow False
human False
If you like to get the the whole row then you can use: df[df[‘class’].str.contains(‘i’, na=False)]
num_legs | num_wings | class | |
---|---|---|---|
hawk | 2 | 2 | bird |
shark | 0 | 0 | fish |
Note: na=False will skip rows with None values. If you need them — use na=True . In case that parameter na is not specified then error will be raised:
ValueError: Cannot mask with non-boolean array containing NA / NaN values
Example 2: Pandas simulate Like operator and regex
Second example will demonstrate the usage of Pandas contains plus regex. Activating regex matching is done by regex=True . The pipe operator ‘sh|rd’ is used as or:
df[df['class'].str.contains('sh|rd', regex=True, na=True)]
The code above will search for all rows which contains:
num_legs | num_wings | class | |
---|---|---|---|
hawk | 2 | 2 | bird |
shark | 0 | 0 | fish |
crow | 2 | 2 |
- match rows which digits — df[‘class’].str.contains(‘\d’, regex=True)
- match rows case insensitive — df[‘class’].str.contains(‘bird’, flags=re.IGNORECASE, regex=True)
Note: Usage of regular expression might slow down the operation in magnitude for bigger DataFrames
Example 3: Pandas match rows starting with text
Let’s find all rows with index starting by letter h by using function str.startswith :
df[df.index.str.startswith('h', na=False)]
num_legs | num_wings | class | |
---|---|---|---|
hawk | 2 | 2 | bird |
human | 2 | 2 | mammal |
Example 4: Pandas match rows ending with text
The same logic can be applied with function: .str.endswith in order to rows which values ends with a given string:
df[df.index.str.endswith('k', na=False)]
num_legs | num_wings | class | |
---|---|---|---|
hawk | 2 | 2 | bird |
shark | 0 | 0 | fish |
Example 5: Pandas Like operator with Query
Pandas queries can simulate Like operator as well. Let’s find a simple example of it. Here is the moment to point out two points:
- naming columns with reserved words like class is dangerous and might cause errors
- the other culprit for errors are None values.
So in order to use query plus str.contains we need to rename column class to classd and fill the None values.
df.query('classd.str.contains("i")', engine='python')
num_legs | num_wings | class | |
---|---|---|---|
hawk | 2 | 2 | bird |
shark | 0 | 0 | fish |
or combination with other conditions:
df.query('classd.str.contains("i") and classd.str.endswith("d") ', engine='python')
num_legs | num_wings | classd | |
---|---|---|---|
hawk | 2 | 2 | bird |
Step 6: Pandas Like operator match numbers only
For this example we are going to use numeric Series like:
s = pd.Series(['20.03', '11', '23.0', '65', '60', 'a', None])
Return all rows with numbers:
[True, True, True, True, True, False, None]
[True, False, True, False, False, False, None]
How to filter for decimal numbers which have 0 after the point like 20.03, 23.0: Is pattern .0 good enough?
No — because 60 is matched too:
[True, False, True, False, True, False, None]
The reason is that pattern .0 matches any character followed by a 0. Searching for floating numbers with dot followed by 0 is done by:
[True, False, True, False, False, False, None]
Step 7: Pandas SQL Like operator
There is a python module: pandasql which allows SQL syntax for Pandas. It can be installed by:
from pandasql import sqldf pysqldf = lambda q: sqldf(q, globals())
sqldf("select * from df where classd like 'h%';", locals())
Resources
By using SoftHints — Python, Linux, Pandas , you agree to our Cookie Policy.