Copy a file line by line in python
You can iterate over lines in a file object in Python by iterating over the file object itself:
for line in f: copy.write(line)
An alternative approach to reading lines is to loop over the file object. This is memory efficient, fast, and leads to simpler code:
Similar question
Writing line by line can be slow when working with large data. You can accelerate the read/write operations by reading/writing a bunch of lines all at once. Please refer to my answer to a similar question here
with open("input.txt", "r", encoding="utf-8") as input_file: with open("output.txt", "w", encoding="utf-8") as output_file: for input_line in input_file: output_line = f(input_line) # You can change the line here output_file.write(output_line)
Note that input_line contains the end-of-line character(s) ( \n or \r\n ), if there are any.
See shutil module for better ways of doing this than copying line-by-line:
shutil.copyfile(src, dst)
Copy the contents (no metadata) of the file named src to a file named dst. dst must be the complete target file name; look at shutil.copy() for a copy that accepts a target directory path. If src and dst are the same files, Error is raised. The destination location must be writable; otherwise, an IOError exception will be raised. If dst already exists, it will be replaced. Special files such as character or block devices and pipes cannot be copied with this function. src and dst are path names given as strings.
Edit: Your question says you are copying line-by-line because the source file is volatile. Something smells wrong about your design. Could you share more details regarding the problem you are solving?
Steven Rumbalski 43090
Files can be iterated directly, without the need for an explicit call to readline :
f = open(". ", "r") copy = open(". ", "w") for line in f: copy.write(line) f.close() copy.close()
Related Query
- Python CSV error: line contains NULL byte, but no NULL byte found in the file
- Python program to read Excel files and copy their sqls into new text file
- copy attribute from line based on expression in line Python
- Processing file in memory line by line and producing file like object in python
- Python Performing calculation on every x line and place it back into original file
- How to efficiently read lines of large text file in Python in different orders: a) random line each time, b) starting in middle.
- Disable file access to a python program from command line
- Python — loop through subdirectories with textual content on them and copy just a part on a text file
- Python 3.4 Find all file types and copy to directory
- python error ImportError: DLL load failed: The specified procedure could not be found. File «psycopg2\_psycopg.pyc», line 10, in __load
- run copy file script in python 27 get path error errno22 in Windows Task Scheduler
- How to write into nth line in a file with python
- Pick parts from a txt file and copy to another file with python
- Copy contents of one file into other using Python 3
- python dictionary, make every odd number line to key and even number line to value from a file
- Python read file line by line and convert to dictionary
- Extracting a random line in a file without loading the file into RAM in python
- python read single line from file as string
- Add a new line after each new file name in python
- Python program prints an extra empty line when reading a text file
- How to search for the word in each line of a file and print the next value of it in python
- Reading a file in Python won’t read the first line
- Python scan file line by line and remove last line in the same loop
- How to send in command line arguments for a python file execution in command line?
- How to get 2nd thing out of every line using python and file parsing
- Python — regex to remove quotes from entire line in CSV file
- python file copy gives larger file
- Search for a word in file & print the matching line — Python
- Insert a line into the middle of a text file in Python
- Why in python loop for line in file is not going through all lines after using readline before?
- How to read each line in txt file which is in rar file in python
- How to copy a JSON file in another JSON file, with Python
- Problem in Python with function only writing one line to file
- Dividing a string by type of data inside each line of file in python
- python 3 replaces line break \r with \n when writing to a file
- Write to file line by line Python
- calling python interpreter with .py file as command line argument not working
- Python File overwrite each time i added a new line
- How to remove a specific line from a text file with python
- split 10 billion line file into 5,000 files by column value in Perl or Python
More Query from same tag
- Check if mutex exists system-wide in python
- How can I loop a mp4 file in moviepy?
- Extend the functionality of a compiled Python script
- Python program, user friendly query
- How to insert data in a row using list in Python?
- redis py and hgetall — why key values have a b»»?
- How can I tell my function to do something specific thing when it receives no parameters?
- How can I render object containing dictionaries to templates in jinja2?
- Is it possible to declare a keyword with arguments inside the name?
- Allow Python Docker to relay email through host’s Sendmail
- trouble installing pycurl on mac 10.8.5
- Python automatic pc restart
- Dict into List Comprehension
- google app engine get_serving_url() is not defined
- iterating through elements in different index positions
- In Python, how to generate nested dict from dotted json file?
- Find the centre value of the two highest peaks in a histogram Python
- How do I use range breaks and scattergl in the same plot?
- How to assign a value to a variable when writing the value takes multiple lines (python)
- How can I get the position from smaller number to biggest number of list?
- Dictionary with multiple same keys with different values python
- sort list of class instances according to master list
- Trouble getting href from BeautifulSoup resultSet object, but can get «style» and «class»?
- python 3 web scraping code for yahoo finance stock
- Iterating through dictionary list to search through directory to find files and copy them to new location
- Are my permissions set correctly? (python)
- QSlider with icons
- When scraping wind speed from Google Weather, I can see it, but can’t scrape it
- Checking if an element is generated in a particular function
- Input file, modify column, output file
- Python on the Web
- python fastest algorithm to get the most common prefix out of a list of strings
- Disable automatic doubling of «\» character Python 2.7 — re2 bug
- Standard method for returning all elements generated by a generator or iterator
- Python-Multiple writing the same line in txt file
More efficient way to copy file line by line in python?
Let’s say that in second column I have a ID. In whole files I have 4000 person and each have 50k records.
I can’t use my prepared script for analysis on that big file (10GB – scripts in pandas, and I have too low memory. I know I should refactored it, and I working on it), so I need to divided that file to 4. But I can’t split ID between files. I mean I can’t have a part of one person in separate files.
So I write script. It divided file on 4 based on ID.
file1 = open('file.txt', 'r') count = 0 list_of_ids= set() while True: if len(list_of_ids) < 1050: a = "out1.csv" elif (len(list_of_ids)) >= 1049 and (len(list_of_ids)) < 2100: a = "out2.csv" elif (len(list_of_ids)) >= 2099 and (len(list_of_ids)) < 3200: a = "out3.csv" else: a = "out4.csv" line = file1.readline() if not line: break try: list_of_ids.add(line.split(',')[1]) out = open(a, "a") out.write(line) except IndexError as e: print(e) count += 1 out.close()
But it’s sooooo slow, and I need to speed it up.
There is many if, and each time I open file, but I can’t figure it out how to get better performance.
Maybe someone have some tips?
Answer
davy.ai
- Instead of checking the length of the set in each iteration, it’s better to use a counter variable to keep track of the number of unique IDs encountered so far. Then, we can use integer division to assign the correct file name based on the number of unique IDs encountered.
- It’s better to use a dictionary to store the file handles so that we don’t need to open and close the files in each iteration.
- We can also process the file line by line instead of reading the entire file into memory using the readlines() method.
Here’s an updated code with these optimizations:
file_handles = count = 0 unique_ids = set() with open("file.txt", "r") as file1: next(file1) # skip header for line in file1: id_ = line.split(',')[1] if id_ not in unique_ids: unique_ids.add(id_) file_handle = file_handles["out<>".format((len(unique_ids) - 1) // 1050 + 1)] file_handle.write(line) count += 1 for fh in file_handles.values(): fh.close()
- I assumed that the header line should be skipped, so I used the next() method to get the first data line.
- I used tuple unpacking to get the file handle from the dictionary instead of the if-else conditions.
- I used the “with” statement to ensure proper closing of the file handles after the processing is done.