I've got a dataframe that I'd like to split by a column variable like the example below:
gender height weight
male 42.8 157.5
male 41.3 165.6
female 48.4 144.2
My desired outcome is:
df_male
gender height weight
male 42.8 157.5
male 41.3 165.6
df_female
gender height weight
female 48.4 144.2
The catch is that I'd like to be able to do this with a variable that has anywhere from 5-25 categories.
My thought is that there should be a way to loop over the original dataframe and spit out multiple dataframes but I'm open to all possible solutions
The following will produce a list containing one dataframe for each value of the gender
column:
import io
import pandas as pd
data = io.StringIO('''\
gender height weight
male 42.8 157.5
male 41.3 165.6
female 48.4 144.2
''')
df = pd.read_csv(data, delim_whitespace=True)
dfs = [rows for _, rows in df.groupby('gender')]
dfs
is a list of length 2, with the following elements:
print(dfs[0])
# gender height weight
# 2 female 48.4 144.2
print(dfs[1])
# gender height weight
# 0 male 42.8 157.5
# 1 male 41.3 165.6
It might be even better to create a dictionary with keys the distinct values in the gender
column and values the dataframes:
dfs = [{gender: rows for gender, rows in df.groupby('gender')}
results in the following dictionary:
{'female': gender height weight
2 female 48.4 144.2,
'male': gender height weight
0 male 42.8 157.5
1 male 41.3 165.6}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.