Combining CSVs with Python issue

Question

I'm trying to combine a bunch of CSVs in a folder into one using Python. Each CSV has 9 columns but no headers. When they combine, some 'sheets' are spread far to the right in the sheet. So it seems they are not combining properly.

Please see code below

## Merge Multiple 1M Rows CSV files
import os
import pandas as pd

# 1. defines path to csv files
path = "C://halfordsCSV//new//Archive1/"

# 2. creates list with files to merge based on name convention
file_list = [path + f for f in os.listdir(path) if f.startswith('greyville_po-')]

# 3. creates empty list to include the content of each file converted to pandas DF
csv_list = []

# 4. reads each (sorted) file in file_list, converts it to pandas DF and appends it to the 
csv_list
for file in sorted(file_list):
csv_list.append(pd.read_csv(file).assign(File_Name = os.path.basename(file)))

# 5. merges single pandas DFs into a single DF, index is refreshed 
csv_merged = pd.concat(csv_list, ignore_index=True)

# 6. Single DF is saved to the path in CSV format, without index column
csv_merged.to_csv(path + 'halfordsOrders.csv', index=False)

It should be sticking to the same number of columns. Any idea what might be going wrong?

Answer 1

First, please check if separator and delimiter are fine in pandas.read_csv, default are ',' and None. You can pass them like that for example:

pandas.read_csv("my_file_path", sep=';', delimiter=',')

If they are already ok regarding to your csv files, try cleaning the dataframes before concating them

replace:

for file in sorted(file_list):
     csv_list.append(pd.read_csv(file).assign(File_Name = os.path.basename(file)))

by:

 nan_value = float("NaN")
 for file in sorted(file_list):
     my_df = pd.read_csv(file)
     my_df.assign(File_Name = os.path.basename(file))
     my_df.replace("", nan_value, inplace=True)
     my_df.dropna(how='all', axis=1, inplace=True)
     csv_list.append(my_df)

Combining CSVs with Python issue

Question

1 answers

solution1
0 2022-08-22 12:32:59

Combining CSVs with Python issue

Question

1 answers

solution1 0 2022-08-22 12:32:59

solution1
0 2022-08-22 12:32:59