I have following dataframe:
data = {'participant_id': [1, 100, 125, 125, 1, 100],
'test_day':['Day_1', 'Day_1', 'Day_12', 'Day_14', 'Day_4', 'Day_4'],
'favorite_color': ['blue', 'red', 'yellow', 'green', 'yellow', 'green'],
'grade': [88, 92, 95, 70, 80, 30]}
df = pd.DataFrame(data, columns = ['participant_id', 'test_day', 'favorite_color', 'grade'])
It has 10000 rows and contains data for 400 test participants labelled with unique and completely random ID's stored in 'participant_id' column. My task is to create dataframes for individuals (per 'participant_id') and then save them to the separate csv files (400 in total).
I've been trying to figure out how to do it for a couple of days now but with no luck.
Can you please help me?
I am still learning how to program and trying to apply knowledge from data science course. I am using Pandas and normally I access data about individual participant with df.loc, I have also created a list of all of the participant_id's but I don't know how to combine both to achieve the desired result automatically.
Solution by @jpp is great. My adaptation based on your solution is
import pandas as pd
import numpy as np
data = {'participant_id': [1, 100, 125, 125, 1, 100],
'test_day':['Day_1', 'Day_1', 'Day_12', 'Day_14', 'Day_4', 'Day_4'],
'favorite_color': ['blue', 'red', 'yellow', 'green', 'yellow', 'green'],
'grade': [88, 92, 95, 70, 80, 30]
}
col = list(data.keys())
df = pd.DataFrame(data, columns = col)
for part_id, df_id in df.groupby('participant_id'):
df_id.to_csv(f'{part_id}.csv',index=False)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.