- How to count the number of lines in a text file in Python
- Count the number of lines in a text file in Python
- Count number of lines in a text file of large size
- Python Count Number of Lines in a File
- Table of contents
- Steps to Get Line Count in a File
- Generator and Raw Interface to get Line Count
- Use readlines() to get Line Count
- Use Loop and Sum Function to Count Lines
- The in Operator and Loop to get Line Count
- Count number of lines in a file Excluding Blank Lines
- Conclusion
- About Vishal
- Related Tutorial Topics:
- Python Exercises and Quizzes
- Python Cookbook by
- Counting Lines in a File
- Problem
- Solution
- Discussion
- See Also
How to count the number of lines in a text file in Python
In order to learn how to count the number of lines in a text file in Python, you have to understand the open() function in Python. In this tutorial, we will learn to count the number of lines in text files using Python.
If you already have a basic understanding of the Python open() function, then let’s follow this tutorial…
Text files can be used in many situations. For example, you may save your in a text file or you may fetch the data of a text file in Python. In my previous tutorial, I have shown you how to create a text file in Python.
Now in this article, I will show you how to count the total number of lines in a text file.
In order to open a file, we need to use the open() function.
Count the number of lines in a text file in Python
We can reach our aim with various techniques. Some of those can only deal with small to medium size text files and some techniques are able to handle large files.
Here I am going to provide both of these techniques so that you can use the perfect one for you.
Assume that you have a text file in the same directory with the filename: this_is_file.txt . Suppose, this file contains the given text content that you can see below:
Hello I am first line I am the 2nd line I am oviously 3rd line
To get the number of lines in our text file below is the given Python program:
number_of_lines = len(open('this_is_file.txt').readlines( )) print(number_of_lines)
Special Note: It can not deal with very large files. But it will work fine on small to medium size files
Count number of lines in a text file of large size
To handle large-size text file you can use the following Python program:
with open('this_is_file.txt') as my_file: print(sum(1 for _ in my_file))
If you have any doubts or suggestions you can simply write in the below comment section
Python Count Number of Lines in a File
If the file is significantly large (in GB), and you don’t want to read the whole file to get the line count, This article lets you know how to get the count of lines present in a file in Python.
Table of contents
Steps to Get Line Count in a File
Count Number of Lines in a text File in Python
- Open file in Read Mode To open a file pass file path and access mode r to the open() function.
For example, fp= open(r’File_Path’, ‘r’) to read a file. - Use for loop with enumerate() function to get a line and its number. The enumerate() function adds a counter to an iterable and returns it in enumerate object. Pass the file pointer returned by the open() function to the enumerate() . The enumerate() function adds a counter to each line.
We can use this enumerate object with a loop to access the line number. Return counter when the line ends. - Close file after completing the read operation We need to make sure that the file will be closed properly after completing the file operation. Use fp.close() to close a file.
Consider a file “read_demo.txt.” See an image to view the file’s content for reference.
# open file in read mode with open(r"E:\demos\files\read_demo.txt", 'r') as fp: for count, line in enumerate(fp): pass print('Total Lines', count + 1)
- The enumerate() function adds a counter to each line.
- Using enumerate, we are not using unnecessary memory. It is helpful if the file size is large.
- Note: enumerate(file_pointer) doesn’t load the entire file in memory, so this is an efficient fasted way to count lines in a file.
Generator and Raw Interface to get Line Count
A fast and compact solution to getting line count could be a generator expression. If the file contains a vast number of lines (like file size in GB), you should use the generator for speed.
This solution accepts file pointer and line count. To get a faster solution, use the unbuffered (raw) interface, using byte arrays, and making your own buffering.
def _count_generator(reader): b = reader(1024 * 1024) while b: yield b b = reader(1024 * 1024) with open(r'E:\demos\files\read_demo.txt', 'rb') as fp: c_generator = _count_generator(fp.raw.read) # count each \n count = sum(buffer.count(b'\n') for buffer in c_generator) print('Total lines:', count + 1)
Use readlines() to get Line Count
If your file size is small and you are not concerned with performance, then the readlines() method is best suited.
This is the most straightforward way to count the number of lines in a text file in Python.
- The readlines() method reads all lines from a file and stores it in a list.
- Next, use the len() function to find the length of the list which is nothing but total lines present in a file.
Open a file and use the readlines() method on file pointer to read all lines.
with open(r"E:\demos\files\read_demo.txt", 'r') as fp: x = len(fp.readlines()) print('Total lines:', x) # 8
Note: This isn’t memory-efficient because it loads the entire file in memory. It is the most significant disadvantage if you are working with large files whose size is in GB.
Use Loop and Sum Function to Count Lines
You can use the for loop to read each line and pass for loop to sum function to get the total iteration count which is nothing but a line count.
with open(r"E:\demos\files\read_demo.txt", 'r') as fp: num_lines = sum(1 for line in fp) print('Total lines:', num_lines) # 8
If you want to exclude the empty lines count use the below example.
with open(r"E:\demos\files\read_demo.txt", 'r') as fp: num_lines = sum(1 for line in fp if line.rstrip()) print('Total lines:', num_lines) # 8
The in Operator and Loop to get Line Count
Using in operator and loop, we can get a line count of nonempty lines in the file.
- Set counter to zero
- Use a for-loop to read each line of a file, and if the line is nonempty, increase line count by 1
# open file in read mode with open(r"E:\demos\files_demos\read_demo.txt", 'r') as fp: count = 0 for line in fp: if line != "\n": count += 1 print('Total Lines', count)
Count number of lines in a file Excluding Blank Lines
For example, below is the text file which uses the blank lines used to separate blocks.
Jessa = 70 Kelly = 80 Roy = 90 Emma = 25 Nat = 80 Sam = 75
When we use all the above approaches, they also count the blank lines. In this example, we will see how to count the number of lines in a file, excluding blank lines
count = 0 with open('read_demo.txt') as fp: for line in fp: if line.strip(): count += 1 print('number of non-blank lines', count)
number of non-blank lines 6
Conclusion
- Use readlines() or A loop solution if the file size is small.
- Use Generator and Raw interface to get line count if you are working with large files.
- Use a loop and enumerate() for large files because we don’t need to load the entire file in memory.
Did you find this page helpful? Let others know about it. Sharing helps me continue to create free Python resources.
About Vishal
I’m Vishal Hule, Founder of PYnative.com. I am a Python developer, and I love to write articles to help students, developers, and learners. Follow me on Twitter
Related Tutorial Topics:
Python Exercises and Quizzes
Free coding exercises and quizzes cover Python basics, data structure, data analytics, and more.
- 15+ Topic-specific Exercises and Quizzes
- Each Exercise contains 10 questions
- Each Quiz contains 12-15 MCQ
Python Cookbook by
Get full access to Python Cookbook and 60K+ other titles, with a free 10-day trial of O’Reilly.
There are also live events, courses curated by job role, and more.
Counting Lines in a File
Problem
You need to compute the number of lines in a file.
Solution
The simplest approach, for reasonably sized files, is to read the file as a list of lines so that the count of lines is the length of the list. If the file’s path is in a string bound to the thefilepath variable, that’s just:
count = len(open(thefilepath).readlines( ))
For a truly huge file, this may be very slow or even fail to work. If you have to worry about humongous files, a loop using the xreadlines method always works:
count = 0 for line in open(thefilepath).xreadlines( ): count += 1
Here’s a slightly tricky alternative, if the line terminator is ‘\n’ (or has ‘\n’ as a substring, as happens on Windows):
count = 0 thefile = open(thefilepath, 'rb') while 1: buffer = thefile.read(8192*1024) if not buffer: break count += buffer.count('\n') thefile.close( )
Without the ‘rb’ argument to open , this will work anywhere, but performance may suffer greatly on Windows or Macintosh platforms.
Discussion
If you have an external program that counts a file’s lines, such as wc -l on Unix-like platforms, you can of course choose to use that (e.g., via os.popen( ) ). However, it’s generally simpler, faster, and more portable to do the line-counting in your program. You can rely on almost all text files having a reasonable size, so that reading the whole file into memory at once is feasible. For all such normal files, the len of the result of readlines gives you the count of lines in the simplest way.
If the file is larger than available memory (say, a few hundred of megabytes on a typical PC today), the simplest solution can become slow, as the operating system struggles to fit the file’s contents into virtual memory. It may even fail, when swap space is exhausted and virtual memory can’t help any more. On a typical PC, with 256 MB of RAM and virtually unlimited disk space, you should still expect serious problems when you try to read into memory files of, say, 1 or 2 GB, depending on your operating system (some operating systems are much more fragile than others in handling virtual-memory issues under such overstressed load conditions). In this case, the xreadlines method of file objects, introduced in Python 2.1, is generally a good way to process text files line by line. In Python 2.2, you can do even better, in terms of both clarity and speed, by looping directly on the file object:
for line in open(thefilepath): count += 1
However, xreadlines does not return a sequence, and neither does a loop directly on the file object, so you can’t just use len in these cases to get the number of lines. Rather, you have to loop and count line by line, as shown in the solution.
Counting line-terminator characters while reading the file by bytes, in reasonably sized chunks, is the key idea in the third approach. It’s probably the least immediately intuitive, and it’s not perfectly cross-platform, but you might hope that it’s fastest (for example, by analogy with Recipe 8.2 in the Perl Cookbook ).
However, remember that, in most cases, performance doesn’t really matter all that much. When it does matter, the time sink might not be what your intuition tells you it is, so you should never trust your intuition in this matter—instead, always benchmark and measure. For example, I took a typical Unix syslog file of middling size, a bit over 18 MB of text in 230,000 lines:
[situ@tioni nuc]$ wc nuc 231581 2312730 18508908 nuc
and I set up the following benchmark framework script, bench.py :
import time def timeo(fun, n=10): start = time.clock( ) for i in range(n): fun( ) stend = time.clock( ) thetime = stend-start return fun._ _name_ _, thetime import os def linecount_wc( ): return int(os.popen('wc -l nuc').read().split( )[0]) def linecount_1( ): return len(open('nuc').readlines( )) def linecount_2( ): count = 0 for line in open('nuc').xreadlines( ): count += 1 return count def linecount_3( ): count = 0 thefile = open('nuc') while 1: buffer = thefile.read(65536) if not buffer: break count += buffer.count('\n') return count for f in linecount_wc, linecount_1, linecount_2, linecount_3: print f._ _name_ _, f( ) for f in linecount_1, linecount_2, linecount_3: print "%s: %.2f"%timeo(f)
First, I print the line counts obtained by all methods, thus ensuring that there is no anomaly or error (counting tasks are notoriously prone to off-by-one errors). Then, I run each alternative 10 times, under the control of the timing function timeo , and look at the results. Here they are:
[situ@tioni nuc]$ python -O bench.py linecount_wc 231581 linecount_1 231581 linecount_2 231581 linecount_3 231581 linecount_1: 4.84 linecount_2: 4.54 linecount_3: 5.02
As you can see, the performance differences hardly matter: a difference of 10% or so in one auxiliary task is something that your users will never even notice. However, the fastest approach (for my particular circumstances, a cheap but very recent PC running a popular Linux distribution, as well as this specific benchmark) is the humble loop-on-every-line technique, while the slowest one is the ambitious technique that counts line terminators by chunks. In practice, unless I had to worry about files of many hundreds of megabytes, I’d always use the simplest approach (i.e., the first one presented in this recipe).
See Also
The Library Reference section on file objects and the time module; Perl Cookbook Recipe 8.2.
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.