I have csv file data as below
ModelNumber Variables
---------- ----------
208 TotalTerms
208 Children
208 Property
208 isMarried
207 HasLoan
207 Children
how to generate below output
ModelNumber Variable1 Variable2 Variable3 Variable4
---------- ---------- ---------- ---------- ----------
208 TotalTerms Children Property isMarried
207 HasLoan Children
I think a better case for your problem is to use pivot_table and define each variable as column instead of variable1, variable2, etc... And simply use 1/0 (True/False)for each variable in each model number:
df_1 = pd.DataFrame({'ModelNumber':[208,208,208,208,207,207],
'Variables':['TotalTerms','Children','Property','isMarried','HasLoan','Children']})
df_output = pd.pivot_table(df_1,index='ModelNumber',columns='Variables',aggfunc=len)
print(df_output)
Output:
Variables Children HasLoan Property TotalTerms isMarried
ModelNumber
207 1 1 0 0 0
208 1 0 1 1 1
I'll write steps so it will be easier for you.
Step 1: Read csv file
Step 2: While reading put data in the dict (we want to have data like ModelNumber as a key and Variables as an array elements), if the variable value is in the dict then append it's value to the array, if not, add its key to the dict with empty array as a value and then add this variable to the array.
Example data representation based on your data:
{
"208": ["TotalTerms", "Children", "Property", "isMarried"],
"207": ["HasLoan", "Children"]
}
Step 3: export this data back to csv
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.