- Counting words with Python’s Counter#
- Getting started#
- Counting words in a book#
- Secret tricks with Counter#
- Only extracting some words with regular expressions#
- 4 Solid Ways To Count Words in a String in Python
- Different Ways in Python to count words in a String
- 1. Count Words Using For loop-
- 2. Using split() to count words in a string
- 3. Count the frequency of words in a String in Python using Dictionary
- 4. Count frequency of words in string in Python Using Count()
- Must Read
- Conclusion
- Word Count in Python
- Word count in Python
- Python word count example
- Top 5 Words in a file in Python
Counting words with Python’s Counter#
Like all things, counting words using Python can be done two different ways: the easy way or the hard way. Using the Counter tool is the easy way!
Counter is generally used for, well, counting things.
Getting started#
from collections import Counter Counter([1, 4, 3, 2, 3, 3, 2, 1, 3, 4, 1, 2])
If you have a list of words, you can use it to count how many times each word appears.
Counter(['hello', 'goodbye', 'goodbye', 'hello', 'hello', 'party'])
If we want to use it to count words in a normal piece of text, though, we’ll have to turn our text into a list of words. We also need to do a little bit of cleanup — removing punctuation, making everything lowercase, just making sure the only things left are words.
import re text = """Yesterday I went fishing. I don't fish that often, so I didn't catch any fish. I was told I'd enjoy myself, but it didn't really seem that fun.""" # Force to all be lowercase because FISH and fish and Fish are the same text = text.lower() # Remove anything that isn't a word character or a space # We could use .replace(".", "") but regex is a lot easier! text = re.sub("[^\w ]", "", text) print("Cleaned sentence is:", text) words = text.split(" ") Counter(words)
Cleaned sentence is: yesterday i went fishing i dont fish that often so i didnt catch any fish i was told id enjoy myself but it didnt really seem that fun
If you have a lot of text, you’re usually only interested in the most common words. If you just want the top words, .most_common is going to be your best friend.
[('i', 4), ('fish', 2), ('that', 2), ('didnt', 2), ('yesterday', 1)]
Counting words in a book#
Now that we know the basics of how to clean text and do text analysis with Counter , let’s try it with an actual book! We’ll use Jane Austen’s Pride and Prejudice.
import requests response = requests.get('http://www.gutenberg.org/cache/epub/42671/pg42671.txt') text = response.text print(text[4100:4500])
d to be any thing extraordinary now. When a woman has five grown up daughters, she ought to give over thinking of her own beauty." "In such cases, a woman has not often much beauty to think of." "But, my dear, you must indeed go and see Mr. Bingley when he comes into the neighbourhood." "It is more than I engage for, I assure you." "But consider your daughters. Only think what an es
The easiest and most boring thing we can do is count the words in it. So, let’s count the words in it.
text = text.lower() text = re.sub("[^\w ]", "", text) words = text.split(" ") Counter(words).most_common(20)
[('the', 3751), ('to', 3746), ('of', 3298), ('', 3289), ('and', 3113), ('her', 1811), ('a', 1745), ('in', 1679), ('i', 1655), ('was', 1622), ('she', 1385), ('that', 1325), ('it', 1294), ('not', 1278), ('he', 1148), ('you', 1145), ('be', 1101), ('his', 1061), ('as', 1052), ('had', 1036)]
Secret tricks with Counter#
Counting words is all fine and good, but if you have a little bit of regular expressions skills we can dig a little bit deeper!
Only extracting some words with regular expressions#
Do men and women do different things in this book? Let’s look at she ____ and he ____ to see what we can find out!
\b marks a word boundary, otherwise the phrase «she talks» would match both she (\w+) and he (\w+)
# Catch every word after 'she' she_words = re.findall(r"\b[Ss]he (\w+)", text) she_words[:5]
4 Solid Ways To Count Words in a String in Python
Strings are essential data types in any programming language, including python. We need to perform many different operations, also known as string preprocessing like removing the unnecessary spaces, counting the words in a string, making the string in the same cases (uppercase or lowercase). In this article, we will learn how to count words in a string in python.
We will learn how to count the number of words in a string. For example- We have a string-” Hello, this is a string.” It has five words. Also, we will learn how to count the frequency of a particular word in a string.
Different Ways in Python to count words in a String
- Count Words Using For loop-
- Using split() to count words in a string
- Count frequency of words in a string using a dictionary
- Count frequency of words in string Using Count()
1. Count Words Using For loop-
Using for loop is the naïve approach to solve this problem. We count the number of spaces between the two characters.
Iterating through the string for i in string1: # If we encounter space, increment the count with 1. if i==" ": count+=1 return count string="Python is an interpreted, high-level, general-purpose programming language" print("'<>'".format(string),"has total words:",count_words(string)) string2=" Hi. My name is Ashwini " print("'<>'".format(string2),"has total words:",count_words(string2))
''Python is an interpreted, high-level, general-purpose programming language' has total words: 8'' Hi. My name is Ashwini ' has total words: 52. Using split() to count words in a string
We can use split() function to count words in string.
def word_count(string): # Here we are removing the spaces from start and end, # and breaking every word whenever we encounter a space # and storing them in a list. The len of the list is the # total count of words. return(len(string.strip().split(" "))) string="Python is an interpreted, high-level, general-purpose programming language" print("'<>'".format(string),"has total words:",count_words(string)) string2=" Hi. My name is Ashwini " print("'<>'".format(string2),"has total words:",word_count(string2))''Python is an interpreted, high-level, general-purpose programming language' has total words: 8'' Hi. My name is Ashwini ' has total words: 53. Count the frequency of words in a String in Python using Dictionary
# Iterating through the string for i in string: # If the word is already in the keys, increment its frequency if i in word_frequency: word_frequency[i]+=1 # It means that this is the first occurence of the word else: word_frequency[i]=1 return(word_frequency) string="Woodchuck How much wood would a woodchuck chuck if a woodchuck could chuck wood ?" print(wordFrequency(string))4. Count frequency of words in string in Python Using Count()
Count() can be used to count the number of times a word occurs in a string or in other words it is used to tell the frequency of a word in a string. We just need to pass the word in the argument.
def return_count(string,word): string=string.lower() # In string, what is the count that word occurs return string.count(word) string2="Peter Piper picked a peck of pickled peppers. How many pickled peppers did Peter Piper pick?" return_count(string2,'piper')If we want to know the number of times every word occurred, we can make a function for that.
set1=set() string="Woodchuck How much wood would a woodchuck chuck if a woodchuck could chuck wood ?" string=string.lower() # splitting the string whenever we encounter a space string=string.split(" ") # iterate through list-string for i in string: # Storing the word and its frequency in the form of tuple in a set # Set is used to avoid repetition set1.add((i,string.count(i))) print(set1)If we want to know how many times a particular word occur in a string in an interval, we can use start and end parameters of count().
string="Can you can a can as a canner can can a can?" # if you want to take cases into account remove this line string=string.lower() # between index=8 and 17, how many times the word 'can' occurs print(string.count("can",8,17))Must Read
Conclusion
In the current era, data is very important. And as the world of Data Science is growing rapidly, and that too using python, data preprocessing is very important. We need to count words in a string in python to preprocess textual data and for that, the above-discussed methods are very important.
Try to run the programs on your side and let us know if you have any queries.
Happy Coding!
Word Count in Python
This article is all about word count in python. In our last article, I explained word count in PIG but there are some limitations when dealing with files in PIG and we may need to write UDFs for that.
Those can be cleared in Python. I will show you how to do a word count in Python file easily. This is a simple program which you can get done on any Python editors.
Word count in Python
Considering you have already installed Python on your system and you have a sample file on which you want to do a word count in python.
If you don’t have any sample file, recommend you to download the below file. We are using this for example purpose.
Python word count example
First, open the file and save it in a variable like below-
And now the logic for word count in python will be like, we will check if the word exists in the file, just increase the count else leave it as it is.
So below is the finalized python word count code which you can directly run on your Python Editor. Just change the path of the file.
Import sys File= open(‘/C:sentimentdata’) Wordcount=<> For word in file.read().split(): If word not in wordcount: wordcount[word]=1 else: wordcount[word]+=1 for k,v in wordcount.items(): print k,v;This was all about word count in python and python word count code. Hope this will help you. You will be getting output like below-
Now suppose you have to find the top 5 record from this list of words. So what will you do?
Let’s see how to find top 5 words in python?
Top 5 Words in a file in Python
Already in the above section, we have found the count of each word and now just we have to find the most 5 occurred words.
All you to do is just arrange the result of the first section in descending order so that we can find the desired result. Here is the updated code-
File= open(‘/C:sentimentdata’) Wordcount=<> For word in file.read().split(): If word not in wordcount: wordcount[word]=1 else: wordcount[word]+=1 wordcount= sorted (wordcount.items(),key=lambda x:x[1],reverse=true) for k,v in wordcount.items[:5]: print k,v;If you want to, even more, customize this code then here it is-
From collections import counter With open(‘file’) as file Wordcount= counter(file.read().split()) For k,v in wordcount.most_common(5): Print(k,v);And you are done. This was all about word count in python and finding top 5 words in a file through python.
Do try these and let us know how it worked. Do share the issue, if you will experience any.