Copy a file line by line in python

You can iterate over lines in a file object in Python by iterating over the file object itself:

for line in f: copy.write(line) 

An alternative approach to reading lines is to loop over the file object. This is memory efficient, fast, and leads to simpler code:

Writing line by line can be slow when working with large data. You can accelerate the read/write operations by reading/writing a bunch of lines all at once. Please refer to my answer to a similar question here

with open("input.txt", "r", encoding="utf-8") as input_file: with open("output.txt", "w", encoding="utf-8") as output_file: for input_line in input_file: output_line = f(input_line) # You can change the line here output_file.write(output_line) 

Note that input_line contains the end-of-line character(s) ( \n or \r\n ), if there are any.

See shutil module for better ways of doing this than copying line-by-line:

shutil.copyfile(src, dst)

Copy the contents (no metadata) of the file named src to a file named dst. dst must be the complete target file name; look at shutil.copy() for a copy that accepts a target directory path. If src and dst are the same files, Error is raised. The destination location must be writable; otherwise, an IOError exception will be raised. If dst already exists, it will be replaced. Special files such as character or block devices and pipes cannot be copied with this function. src and dst are path names given as strings.

Edit: Your question says you are copying line-by-line because the source file is volatile. Something smells wrong about your design. Could you share more details regarding the problem you are solving?

Files can be iterated directly, without the need for an explicit call to readline :

f = open(". ", "r") copy = open(". ", "w") for line in f: copy.write(line) f.close() copy.close() 
More efficient way to copy file line by line in python?

Let’s say that in second column I have a ID. In whole files I have 4000 person and each have 50k records.

I can’t use my prepared script for analysis on that big file (10GB – scripts in pandas, and I have too low memory. I know I should refactored it, and I working on it), so I need to divided that file to 4. But I can’t split ID between files. I mean I can’t have a part of one person in separate files.

So I write script. It divided file on 4 based on ID.

file1 = open('file.txt', 'r') count = 0 list_of_ids= set() while True: if len(list_of_ids) < 1050: a = "out1.csv" elif (len(list_of_ids)) >= 1049 and (len(list_of_ids)) < 2100: a = "out2.csv" elif (len(list_of_ids)) >= 2099 and (len(list_of_ids)) < 3200: a = "out3.csv" else: a = "out4.csv" line = file1.readline() if not line: break try: list_of_ids.add(line.split(',')[1]) out = open(a, "a") out.write(line) except IndexError as e: print(e) count += 1 out.close() 

But it’s sooooo slow, and I need to speed it up.
There is many if, and each time I open file, but I can’t figure it out how to get better performance.
Maybe someone have some tips?

  1. Instead of checking the length of the set in each iteration, it’s better to use a counter variable to keep track of the number of unique IDs encountered so far. Then, we can use integer division to assign the correct file name based on the number of unique IDs encountered.
  2. It’s better to use a dictionary to store the file handles so that we don’t need to open and close the files in each iteration.
  3. We can also process the file line by line instead of reading the entire file into memory using the readlines() method.

Here’s an updated code with these optimizations:

file_handles = count = 0 unique_ids = set() with open("file.txt", "r") as file1: next(file1) # skip header for line in file1: id_ = line.split(',')[1] if id_ not in unique_ids: unique_ids.add(id_) file_handle = file_handles["out<>".format((len(unique_ids) - 1) // 1050 + 1)] file_handle.write(line) count += 1 for fh in file_handles.values(): fh.close() 
  • I assumed that the header line should be skipped, so I used the next() method to get the first data line.
  • I used tuple unpacking to get the file handle from the dictionary instead of the if-else conditions.
  • I used the “with” statement to ensure proper closing of the file handles after the processing is done.


