While working with a large dataset in the form of .csv files in Pandas
DataFrame, it might be possible that a single file does not contain the complete information for data analysis. In this case, we need to merge multiple files in a single pandas
DataFrame. Python pandas library provides various methods to solve this problem, such as
In this guide, we will learn two different methods for merging the multiple .csv files into a single Pandas
dataframe with the help of different examples.
Combine Multiple CSV Files in a Single Pandas
Dataframe Using Merging by Names
To merge multiple .csv files, first, we import the pandas library and set the file paths. Then, using the
pd.read_csv() method reads all the CSV files. The
pd.concat() takes the mapped CSV files as an argument and then merges them by default along the row axis. The
ignore_index=True argument is used to set the continuous index values for the newly merged
See the following example we have implemented the approach as mentioned above using pandas python:
import pandas as pd # set files path sales1 = 'C:\\Users\\DELL\\OneDrive\\Desktop\\salesdata1.csv' sales2 = 'C:\\Users\DELL\\OneDrive\\Desktop\\salesdata2.csv' print("*** Merging multiple csv files into a single pandas dataframe ***") # merge files dataFrame = pd.concat( map(pd.read_csv, [sales1, sales2]), ignore_index=True) print(dataFrame)
*** Merging multiple csv files into a single pandas dataframe *** Product_Name Quantity Sale_Price 0 Acer laptop 3 500$ 1 Dell Laptop 6 700$ 2 Hp laptop 8 800$ 3 Lenavo laptop 2 600$ 4 Acer laptop 3 500$ 5 Dell Laptop 6 700$ 6 Hp laptop 8 800$ 7 Lenavo laptop 2 600$
Merge Multiple CSV Files in a Single Pandas
Dataframe by Merging All Fields
To merge all .csv files in a pandas
DataFrame, we used the glob module in this approach. First, we had to import all libraries. After that, we set the path for all files that we need to merge.
In the following example, the
os.path.join() takes the file path as the first argument and the path components or .csv files to be joined as the second argument. Here, the
salesdata*.csv will match and return every file that starts with
salesdata in the specified home directory and ends with the .csv extension. The
glob.glob(files_joined) takes an argument of the merged file names and returns a list of all merge files.
See the following example to merge all .csv files using the glob module:
import pandas as pd import glob import os # merging the files files_joined = os.path.join('C:\\Users\\DELL\\OneDrive\\Desktop\\CSV_files', "salesdata*.csv") # Return a list of all joined files list_files = glob.glob(files_joined) print("** Merging multiple csv files into a single pandas dataframe **") # Merge files by joining all files dataframe = pd.concat(map(pd.read_csv, list_files), ignore_index=True) print(dataframe)
** Merging multiple csv files into a single pandas dataframe ** Product_Name Quantity Sale_Price 0 Acer laptop 3 500$ 1 Dell Laptop 6 700$ 2 Hp laptop 8 800$ 3 Lenavo laptop 2 600$ 4 Acer laptop 3 500$ 5 Dell Laptop 6 700$ 6 Hp laptop 8 800$ 7 Lenavo laptop 2 600$
We introduced two approaches to merge multiple CSV files in pandas python in this tutorial. We have seen how we can read .csv files and merge them into a single Pandas
dataframe using the
pd.concat() method. Moreover, we now know how to use the
glob module in Pandas python code.