I'm trying to create code that will take data form certain columns in a CSV file and combine them into a new CSV file. I was directed to use Pandas but I'm not sure if I'm even on the right track. I'm fairly new to Python so prepare yourselves for potentially awful code.
I want to use data.csv:
Customer_ID,Date,Time,OtherColumns,A,B,C,Cost
1003,January,2:00,Stuff,1,5,2,519
1003,January,2:00,Stuff,1,3,2,530
1003,January,2:00,Stuff,1,3,2,530
1004,Feb,2:00,Stuff,1,1,0,699
and create a new CSV that looks like this:
Customer_ID,ABC
1003,152
1003,132
1003,132
1004,110
What I have so far is:
import csv
import pandas as pd
df = pd.read_csv('test.csv', delimiter = ',')
custID = df.customer_ID
choiceA = df.A
choiceB = df.B
choiceC = df.C
ofile = open('answer.csv', "wb")
writer = csv.writer(ofile, delimiter = ',')
writer.writerow(custID + choiceA + choiceB + choiceC)
Unfortunately all that does is add each row together, then create a CSV of each row summed together as one row. My true end goal would be to find the most occurring values in columns AC and combine each customer into the same row, using the most occurring values. I'm awful at explaining. I'd want something that takes data.csv and makes this:
Customer_ID,ABC
1003,132
1004,110
You can sum the columns your interested in (if their type is string):
In [11]: df = pd.read_csv('data.csv', index_col='Customer_ID')
In [12]: df
Out[12]:
Date Time OtherColumns A B C Cost
Customer_ID
1003 January 2:00 Stuff 1 5 2 519
1003 January 2:00 Stuff 1 3 2 530
1003 January 2:00 Stuff 1 3 2 530
1004 Feb 2:00 Stuff 1 1 0 699
In [13]: res = df[list('ABC')].astype(str).sum(1) # cols = list('ABC')
In [14]: res
Out[14]:
Customer_ID
1003 152
1003 132
1003 132
1004 110
dtype: float64
To get the csv, you can first use to_frame
(to add the desired column name):
In [15]: res.to_frame(name='ABC') # ''.join(cols)
Out[15]:
ABC
Customer_ID
1003 152
1003 132
1003 132
1004 110
In [16]: res.to_frame(name='ABC').to_csv('new.csv')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.