Merge files with python

Содержание

Merge two text files into one in Python
Python merging of two text files:
TEXT FILE1:
TEXT FILE2:
CODE TO MERGE:
4 responses to “Merge two text files into one in Python”
How to merge multiple files into a new file using Python?
Using Loops
Example — 1
Output
Example — 2
Output
How to Merge CSV Files with Python (Pandas DataFrame)
Setup
1: Merge CSV files to DataFrame
2: Read CSV files without header
3: Read sorted CSV files
4: Change CSV separator
5: Keep trace of CSV files
6: Merge CSV files to single with Python
7. Full Code
Conclusion
How to merge multiple CSV files with Python
Steps to merge multiple CSV(identical) files with Python
Step 1: Import modules and set the working directory
Step 2: Match CSV files by pattern
Step 3: Combine all files in the list and export as CSV
Full Code
Steps to merge multiple CSV(identical) files with Python with trace
Combine multiple CSV files when the columns are different
Bonus: Merge multiple files with Windows/Linux
Linux
Windows

Merge two text files into one in Python

In this tutorial, we are going to learn about merge two files in Python with some easy and understandable examples.

When most of us deal with files in Python we usually come across situations where we require to merge the contents of the two files into one.

In this tutorial, let us learn how to solve this problem.

Without any delay, let us see how to solve the above-specified problem.

Python merging of two text files:

In order to solve the above problem in Python we have to follow the below-mentioned steps:

Open the two files which we want to merge in the “READ” mode.

Open the third file in the “WRITE” mode.

Firstly, Read data from the first file and store it as a string.

Secondly, Read data from the second file and perform string concatenation.

Close all files and finally check the file into which the merging is done, for the successful merging or not.

TEXT FILE1:

TEXT FILE2:

CODE TO MERGE:

# Python program to merge two files data = data2 = "" # Reading data from first file with open('file1.txt') as fp: data = fp.read() with open('file2.txt') as fp: data2 = fp.read() # Merging two files into one another file data += "\n" data += data2 with open ('file3.txt', 'w') as fp: fp.write(data)

In the above code, we first read the data from both the files say “file1” AND “file2” which are shown in the above pictures and then we merge these contents into other file say “file3”.

After MERGING the file is:

Finally, I hope that this tutorial has helped you to understand the topic of “how to merge two files in Python”.

4 responses to “Merge two text files into one in Python”

Good Day, Thanks for the elegant way in which you solved this. I am very new to Python and struggling somewhat. How will you go about merging the two files and sort it in ascending order? Thanks

It does not seem to be a “merge” but an “append” of file2 to file1 into file3 as lines with same contents are duplicated. Any idea to deal with duplicates?

You can use this one:
seen_lines = set()
outfile = open(outfilename, “w”)
for line in open(infilename, “r”):
if line not in seen_lines:
outfile.write(line)
seen_lines.add(line)
outfile.close() You can use this piece of code

Источник

How to merge multiple files into a new file using Python?

Python makes it simple to create new files, read existing files, append data, or replace data in existing files. With the aid of a few open-source and third-party libraries, it can manage practically all of the file types that are currently supported.

We must iterate through all the necessary files, collect their data, and then add it to a new file in order to concatenate several files into a single file. This article demonstrates how to use Python to concatenate multiple files into a single file.

Using Loops

A list of filenames or file paths to the necessary python files may be found in the Python code below. Next, advanced_file.py is either opened or created.

The list of filenames or file paths is then iterated over. Each file generates a file descriptor, reads its contents line by line, and then writes the information to the advanced_file.py file.

It adds a newline character, or \n, to the new file at the end of each line.

Example — 1

Following is an example to merge multiple files nto a single file using for loop −

nameOfFiles = ["moving files.py", "mysql_access.py", "stored procedure python-sql.py", "trial.py"] with open("advanced_file.py", "w") as new_created_file: for name in nameOfFiles: with open(name) as file: for line in file: new_created_file.write(line) new_created_file.write("\n")

Output

As an output, a new python file is created with the name “advanced_file” which has all the existing mentioned python files in it.

Example — 2

In the following code we opened the existing file in read mode and the new created file i.e. advanced_file in write mode. After that we read the that frm both the files and added it in a string and wrote the data from string to the new created file. Finally, closed the files −

info1 = info2 = "" # Reading information from the first file with open('mysql_access.py') as file: info1 = file.read() # Reading information from the second file with open('trial.py') as file: info2 = file.read() # Merge two files for adding the data of trial.py from next line info1 += "\n" info1 += info2 with open ('advanced_file.py', 'w') as file: file.write(info1)

Output

As an output a new python file is created with the name “advanced_file” which has both the existing mentioned python files in it.

Источник

How to Merge CSV Files with Python (Pandas DataFrame)

In this short guide, we’re going to merge multiple CSV files into a single CSV file with Python. We will also see how to read multiple CSV files — by wildcard matching — to a single DataFrame.

The code to merge several CSV files matched by pattern to a file or Pandas DataFrame is:

import glob for f in glob.glob('file_*.csv'): df_temp = pd.read_csv(f)

Setup

Suppose we have multiple CSV files like:

into single CSV file like: merged.csv

1: Merge CSV files to DataFrame

To merge multiple CSV files to a DataFrame we will use the Python module — glob . The module allow us to search for a file pattern with wildcard — * .

import pandas as pd import glob df_files = [] for f in glob.glob('file_*.csv'): df_temp = pd.read_csv(f) df_files.append(df_temp) df = pd.concat(df_files)

All files which match the pattern will be iterated in random order
Temporary DataFrame is created for each file
The temporary DataFrame is appended to list
Finally all DataFrames are merged into a single one

2: Read CSV files without header

To skip the headers for the CSV files we can use parameter: header=None

read_csv('file_*.csv', header=None)

To add the headers only for the first file we can:

read the first file with headers
drop duplicates (keep first)
set column names

All depends on the context.

3: Read sorted CSV files

Module glob reads files without order. To ensure the correct order of the read CSV files we can use sorted :

This ensures that the final output CSV file or DataFrame will be loaded in a certain order.

Alternatively we can use parameters: ignore_index=True, , sort=True for Pandas method concat :

merged_df = pd.concat(all_df, ignore_index=True, sort=True)

4: Change CSV separator

We can control what is the separator symbol for the CSV files by using parameter:

5: Keep trace of CSV files

If we like to keep trace of each row loaded — from which CSV file is coming we can use: df_temp[‘file’] = f.split(‘/’)[-1] :

This will data a new column to each file with trace — the file name origin.

6: Merge CSV files to single with Python

Finally we can save the result into a single CSV file from Pandas Dataframe by:

7. Full Code

Finally we can find the full example with most options mentioned earlier:

import pandas as pd import glob df_files = [] for f in sorted(glob.glob('file_*.txt')): df_temp = pd.read_csv(f, header=None, index_col=False, sep='\t') df_temp['file'] = f.split('/')[-1] df_files.append(df_temp) df_merged = pd.concat(df_files) df_merged.to_csv("merged.csv")

Conclusion

We saw how to read multiple CSV files with Pandas and Python. Different options were covered like:

changing separator for read_csv
keeping trace of the source file
sorting files in certain order
skipping headers

By using DataScientYst — Data Science Simplified, you agree to our Cookie Policy.

Источник

How to merge multiple CSV files with Python

In this guide, I’ll show you several ways to merge/combine multiple CSV files into a single one by using Python (it’ll work as well for text and other files). There will be bonus — how to merge multiple CSV files with one liner for Linux and Windows. Finally with a few lines of code you will be able to combine hundreds of files with full control of loaded data — you can convert all the CSV files into a Pandas DataFrame and then mark each row from which CSV file is coming.

Steps to merge multiple CSV(identical) files with Python

Note: that we assume — all files have the same number of columns and identical information inside

Short code example — concatenating all CSV files in Downloads folder:

import pandas as pd import glob path = r'~/Downloads' all_files = glob.glob(path + "/*.csv") all_files

Step 1: Import modules and set the working directory

First we will start with loading the required modules for the program and selecting working folder:

import os, glob import pandas as pd path = "/home/user/data/"

Step 2: Match CSV files by pattern

Next step is to collect all files needed to be combined. This will be done by:

all_files = glob.glob(os.path.join(path, "data_*.csv"))

The next code: data_*.csv match only files:

You can customize the selection for your needs having in mind that regex matching is used.

Step 3: Combine all files in the list and export as CSV

The final step is to load all selected files into a single DataFrame and converted it back to csv if needed:

df_merged = (pd.read_csv(f, sep=',') for f in all_files) df_merged = pd.concat(df_from_each_file, ignore_index=True) df_merged.to_csv( "merged.csv")

Note that you may change the separator by: sep=’,’ or change the headers and rows which to be loaded

You can find more about converting DataFrame to CSV file here: pandas.DataFrame.to_csv

Full Code

Below you can find the full code which can be used for merging multiple CSV files.

import os, glob import pandas as pd path = "/home/user/data/" all_files = glob.glob(os.path.join(path, "data_*.csv")) df_from_each_file = (pd.read_csv(f, sep=',') for f in all_files) df_merged = pd.concat(df_from_each_file, ignore_index=True) df_merged.to_csv( "merged.csv")

Steps to merge multiple CSV(identical) files with Python with trace

Now let’s say that you want to merge multiple CSV files into a single DataFrame but also to have a column which represents from which file the row is coming. Something like:

row	col	col2	file
1	A	B	data_201901.csv
2	C	D	data_201902.csv

This can be achieved very easy by small change of the code above:

import os, glob import pandas as pd path = "/home/user/data/" all_files = glob.glob(os.path.join(path, "*.csv")) all_df = [] for f in all_files: df = pd.read_csv(f, sep=',') df['file'] = f.split('/')[-1] all_df.append(df) merged_df = pd.concat(all_df, ignore_index=True, sort=True)

In this example we iterate over all selected files, then we extract the files names and create a column which contains this name.

Combine multiple CSV files when the columns are different

Sometimes the CSV files will differ for some columns or they might be the same only in the wrong order to be wrong. In this example you can find how to combine CSV files without identical structure:

import os, glob import pandas as pd path = "/home/user/data/" all_files = glob.glob(os.path.join(path, "*.csv")) all_df = [] for f in all_files: df = pd.read_csv(f, sep=',') f['file'] = f.split('/')[-1] all_df.append(df) merged_df = pd.concat(all_df, ignore_index=True, , sort=True)

Pandas will align the data by this method: pd.concat . In case of a missing column the rows for a given CSV file will contain NaN values:

row	col	col2	col_201901	file
1	A	B	AA	data_201901.csv
2	C	D	NaN	data_201902.csv

If you need to compare two csv files for differences with Python and Pandas you can check: Python Pandas Compare Two CSV files based on a Column

Bonus: Merge multiple files with Windows/Linux

Linux

Sometimes it’s enough to use the tools coming natively from your OS or in case of huge files. Using python to concatenate multiple huge files might be challenging. In this case for Linux it can be used:

sed 1d data_*.csv > merged.csv

In this case we are working in the current folder by matching all files starting with data_ . This is important because if you try to execute something like:

You will try to merge the newly output file as well which may cause issues. Another important note is that this will skip the first lines or headers of each file. In order to include headers you can do:

sed -n 1p data_1.csv > merged.csv sed 1d data_*.csv >> merged.csv

If the commands above are not working for you then you can try with the next two. The first one will merge all csv files but have problems if the files ends without new line:

head -n 1 1.csv > combined.out && tail -n+2 -q *.csv >> merged.out

The second one will merge the files and will add new line at the end of them:

head -n 1 1.csv > combined.out for f in *.csv; do tail -n 2 "$f"; printf "\n"; done >> merged.out

Windows

The Windows equivalent on this will be:

C:\> copy data_*.csv merged.csv

Источник