Python - How To Merge CSV Files in Python

ID : 541

viewed : 170

Tags : PythonPython CSV

vote vote

95

While working with a large dataset in the form of .csv files in Pandas DataFrame, it might be possible that a single file does not contain the complete information for data analysis. In this case, we need to merge multiple files in a single pandas DataFrame. Python pandas library provides various methods to solve this problem, such as concat, merge, and join.

In this guide, we will learn two different methods for merging the multiple .csv files into a single Pandas dataframe with the help of different examples.

Combine Multiple CSV Files in a Single Pandas Dataframe Using Merging by Names

To merge multiple .csv files, first, we import the pandas library and set the file paths. Then, using the pd.read_csv() method reads all the CSV files. The pd.concat() takes the mapped CSV files as an argument and then merges them by default along the row axis. The ignore_index=True argument is used to set the continuous index values for the newly merged dataframe.

See the following example we have implemented the approach as mentioned above using pandas python:

Example Code:

import pandas as pd  # set files path sales1 = 'C:\\Users\\DELL\\OneDrive\\Desktop\\salesdata1.csv' sales2 = 'C:\\Users\DELL\\OneDrive\\Desktop\\salesdata2.csv'  print("*** Merging multiple csv files into a single pandas dataframe ***")  # merge files dataFrame = pd.concat(    map(pd.read_csv, [sales1, sales2]), ignore_index=True) print(dataFrame) 

Output:

*** Merging multiple csv files into a single pandas dataframe ***     Product_Name  Quantity Sale_Price 0    Acer laptop         3       500$ 1    Dell Laptop         6       700$ 2      Hp laptop         8       800$ 3  Lenavo laptop         2       600$ 4    Acer laptop         3       500$ 5    Dell Laptop         6       700$ 6      Hp laptop         8       800$ 7  Lenavo laptop         2       600$ 

Merge Multiple CSV Files in a Single Pandas Dataframe by Merging All Fields

To merge all .csv files in a pandas DataFrame, we used the glob module in this approach. First, we had to import all libraries. After that, we set the path for all files that we need to merge.

In the following example, the os.path.join() takes the file path as the first argument and the path components or .csv files to be joined as the second argument. Here, the salesdata*.csv will match and return every file that starts with salesdata in the specified home directory and ends with the .csv extension. The glob.glob(files_joined) takes an argument of the merged file names and returns a list of all merge files.

See the following example to merge all .csv files using the glob module:

Example Code:

import pandas as pd import glob import os  # merging the files files_joined = os.path.join('C:\\Users\\DELL\\OneDrive\\Desktop\\CSV_files', "salesdata*.csv")  # Return a list of all joined files list_files = glob.glob(files_joined)  print("** Merging multiple csv files into a single pandas dataframe **") # Merge files by joining all files dataframe = pd.concat(map(pd.read_csv, list_files), ignore_index=True) print(dataframe) 

Output:

** Merging multiple csv files into a single pandas dataframe **     Product_Name  Quantity Sale_Price 0    Acer laptop         3       500$ 1    Dell Laptop         6       700$ 2      Hp laptop         8       800$ 3  Lenavo laptop         2       600$ 4    Acer laptop         3       500$ 5    Dell Laptop         6       700$ 6      Hp laptop         8       800$ 7  Lenavo laptop         2       600$ 

Conclusion

We introduced two approaches to merge multiple CSV files in pandas python in this tutorial. We have seen how we can read .csv files and merge them into a single Pandas dataframe using the pd.concat() method. Moreover, we now know how to use the glob module in Pandas python code.

  • Related HOW TO?