I have several .csv
files in one directory. I would like to iterate over those files an merge/combine them into a single .csv file given some condition.
Each file uses the same naming convention:
Date Name City Supervisor
2015-01-01_Steve__Boston_Steven.csv
2015-10-03_Michael_Dallas_Thomas.csv
2015-02-10_John_NewYork_Michael.csv
Each file contains only one column with varying length:
2015-01-01_Steve__Boston_Steven.csv
Sales
100
20
3
100
200
or
2015-10-03_Michael_Dallas_Thomas.csv
Sales
1
2
2015-02-10_John_NewYork_Michael.csv
or
Sales
1
2
3
Because the header "Sales" in each file might be named differently I would like to skip the first row and beginn always with the second row.
I would like to get a final table containing the following information:
Sales Name City Supervisor
100 Steve Boston Steven
20 Steve Boston Steven
30 Steve Boston Steven
3 Steve Boston Steven
100 Steve Boston Steven
200 Steve Boston Steven
1 Michael Dallas Thomas
2 Michael Dallas Thomas
1 John NewYork Michael
2 John NewYork Michael
3 John NewYork Michael
Im new to python so apologize for the inconvenience.
What I have tried:
import pandas as pd
from os import listdir
source_path, dst_path = '/oldpath', '/newpath'
files = [f for f in listdir(source_path) if f.endswith('.csv')]
def combining_data(files):
df_list = []
for filename in files:
df_list.append(pd.read_csv(filename))
combining_data(files)
But that unfortunately is doesnt produces the required output
This requires multiple steps. First, I would parse the CSV names to grab the Name, City, and Supervisor. From the looks of it, you can use a split
on the name to grab those values. Then, you have to read the files and append them to a new CSV. Also using pandas is a little bit of overkill. You can use the csv module.
import csv
import os
files = [f for f in os.listdir(source_path) if f.endswith('.csv')]
with open(os.path.join(source_path, 'new_csv.csv'), 'wb') as new:
writer = csv.writer(new)
writer.writerow(['Sales','Name','City','Supervisor']) # write the header for the new csv
for f in files:
split = f[:-4].split('_') # split the filename on _, while removing the .csv
name = split[1] # extract the name
city = split[2] # extract the city
supervisor = split[3] # extract the supervisor
with open(os.path.join(source_path, f), 'rb') as readfile:
reader = csv.reader(readfile)
reader.next() # Skip the header from the file you're reading
for row in reader:
writer.writerow([row[0], name, city, supervisor]) # write to the new csv
With pandas:
import pandas as pd
import os
df=pd.DataFrame(columns=['Sales','Name','City','Supervisor'])
files = [f for f in os.listdir('.') if f.startswith('2015')]
for a in files:
df1 = pd.read_csv(a, header=None, skiprows=1, names=['Sales'])
len1 = len(df1.index)
f = [b for b in a.split('_') if b]
l2, l3 = [f[1], f[2], f[3][:-4]], ['Name','City','Supervisor']
for b,c in zip(l2,l3):
ser = pd.Series(data=[b for _ in range(len1)],index=range(len1))
df1[c]=ser
df = pd.concat([df,df1],axis=0)
df.index = range(len(df.index))
df.to_csv('new_csv.csv', index=None)
df
Output:
CPU times: user 16 ms, sys: 0 ns, total: 16 ms
Wall time: 22.6 ms
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.