I am trying to merge two csv files with a common column and write it to a new file. For example product.csv table will have columns
product_id name
1 Handwash
2 Soap
and subproduct.csv will have columns
product_id subproduct_name volume
1 Dettol 20
1 Lifebuoy 50
2 Lux 100
The output sales.csv file should be like:
product_id name subproduct_name volume
1 Handwash Dettol 20
1 Handwash Lifebuoy 50
2 Soap Lux 100
I have tried to create two dictionaries:
with open('product.csv', 'r') as f:
r = csv.reader(f)
dict1 = {row[0]: row[1:] for row in r}
with open('subproduct.csv', 'r') as f:
r = csv.reader(f)
dict2 = {row[0]: row[1:] for row in r}
Use pandas:
import pandas as pd
products_df = pd.read_csv('product.csv')
subproducts_df = pd.read_csv('subproduct.csv')
sales_df = pd.merge(products_df, subproducts_df, on=0)
Stage 1 : First Pip install pandas if you haven't done that
Stage 2 : Creating the data
data1 = {'product_id': [1, 2],
'name': ['Handwash', 'Soap'],
}
data2 {'product_id': [1, 1, 2],
'subproduct_name': ['Dettol', 'Lifebuoy', 'Lux'], 'volume' : [20, 50, 100]}
Stage 3: Putting it into dataframe
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2))
Stage 4: Merging the dataframes
output = pd.merge(df1, df2, how="inner")
df1=pd.read_csv('product.csv')
df2=pd.read_csv('subproduct.csv')
Do Stage 4
You can work a script with pure python. It has a powerful lib called csv , that should do the trick
import csv
with open('product.csv') as csv_produto:
with open('subproduct.csv') as csv_subproduct:
produto_reader = list(csv.reader(csv_produto, delimiter=','))
subproduct_reader = list(csv.reader(csv_subproduct, delimiter=','))
for p in produto_reader:
for sp in subproduct_reader:
if(p[0]==sp[0]):
print('{},{},{},{}'.format(p[0], p[1], sp[1], sp[2]))
That's the main idea, now you can save the output in csv and add a header handling exceptions.
Other have proposed ways using pandas. You should considere it if your files are big, or if you need to do this operation quite often. But the csv module is enough here.
You cannot use plain dicts here because the keys are not unique: subproduct.csv
has 2 different rows with the same id 1. So I would use dicts of lists instead.
I will admit here that all keys have to be present in product.csv, but some product may have no associated subproducts (meaning a left outer join in database wordings).
So I will use:
Code could be:
with open('product.csv') as f:
r = csv.reader(f)
header1 = next(r)
dict1 = {row[0]: row[1:] for row in r}
dict2 = collections.defaultdict(list)
with open('subproduct.csv', 'r') as f:
r = csv.reader(f)
header2 = next(r)
for row in r:
dict2[row[0]].append(row[1:])
with open('merged.csv', 'w', newline='') as f:
w = csv.writer(f)
_ = w.writerow(header1 + header2[1:])
empty2 = [[] * (len(header2) - 1)]
for k in sorted(dict1.keys()):
for row2 in dict2.get(k, empty2): # accept no subproducts
_ = w.writerow([k] + dict1[k] + row2)
Assuming that your csv files are truely Comma Separated Values files, this gives:
product_id,name,subproduct_name,volume
1,Handwash,Dettol,20
1,Handwash,Lifebuoy,50
2,Soap,Lux,100
Please try this:
import pandas as pd
output = pd.merge(product, sub_product, how = 'outer', left_on= 'product_id', right_on = 'product_id')
It's joining two data frames (product and sub_product) by product_id column which is common for both. The outer join returns all records that match the key on both the data frames. Even how = 'inner' would have also worked in this case
You can read the data straight into a pandas dataframes, and then merge the two dataframes:
import pandas as pd
# load data
product = pd.read_csv('product.csv')
subproduct = pd.read_csv('subproduct.csv')
# merge data
merged = pd.merge(product,subproduct)
# write results to csv
merged.to_csv('sales.csv',index=False)
This works perfectly for your example. Depending on how your actual data looks like, you might need to tweak some of the additional arguments of pd.merge.
Edit: added the write to csv part
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.