Split pandas dataframe by column variable

Question

I've got a dataframe that I'd like to split by a column variable like the example below:

gender height weight
male     42.8  157.5
male     41.3  165.6
female   48.4  144.2

My desired outcome is:

df_male

gender height weight
male     42.8  157.5
male     41.3  165.6

df_female

gender  height  weight
female    48.4   144.2

The catch is that I'd like to be able to do this with a variable that has anywhere from 5-25 categories.

My thought is that there should be a way to loop over the original dataframe and spit out multiple dataframes but I'm open to all possible solutions

Answer 1

The following will produce a list containing one dataframe for each value of the gender column:

import io
import pandas as pd

data = io.StringIO('''\
gender height weight
male     42.8  157.5
male     41.3  165.6
female   48.4  144.2
''')
df = pd.read_csv(data, delim_whitespace=True)

dfs = [rows for _, rows in df.groupby('gender')]

dfs is a list of length 2, with the following elements:

print(dfs[0])

#    gender  height  weight
# 2  female    48.4   144.2

print(dfs[1])

#    gender  height  weight
# 0    male    42.8   157.5
# 1    male    41.3   165.6

It might be even better to create a dictionary with keys the distinct values in the gender column and values the dataframes:

dfs = [{gender: rows for gender, rows in df.groupby('gender')}

results in the following dictionary:

{'female':     gender  height  weight
            2  female    48.4   144.2,
 'male':       gender  height  weight
            0    male    42.8   157.5
            1    male    41.3   165.6}

Split pandas dataframe by column variable

Question

1 answers

solution1
5 ACCPTED 2016-08-15 20:46:26

Split pandas dataframe by column variable

Question

1 answers

solution1 5 ACCPTED 2016-08-15 20:46:26

solution1
5 ACCPTED 2016-08-15 20:46:26