简体   繁体   中英

Split each row into multiple groups using Pandas groupby?

So I have a DataFrame that looks like the following:

In [5]: import pandas as pd, numpy as np
np.random.seed(seed=43525)
descriptors = 'abcdefghi'
df = pd.DataFrame([{'Value':np.random.randint(0,100), 
                       'Group':descriptors[np.random.randint(0, len(descriptors)): 
                                           np.random.randint(0, len(descriptors))]} for i in range(0,10)])
print(df)

  Group  Value
0            4
1   abc     37
2  efgh     99
3     a     67
4           37
5           52
6           46
7     b     41
8     d     17
9           36

The each character in the descriptor list should become its own group (along with the null group). How could I perform a groupby to accomplish this?

So group 'a' would contain indices 1 and 3, group 'b' would contain indices 1 and 7, etc. This is a fairly non-standard approach to using groupby (if it can be accomplish with it at all) so I'm not sure how to proceed.

Building off Edchum answer I came up with the following. The structure resembles that of a groupby object too:

indices = {}
for val in np.unique(''.join(df.Group.values)):
    indices[val] = df[df.Group.str.contains(val)]
print(indices)

Giving the following badly-formatted, but correct answer:

{'a':   Group  Value
1   abc     37
3     a     67, 'c':   Group  Value
1   abc     37, 'b':   Group  Value
1   abc     37
7     b     41, 'e':   Group  Value
2  efgh     99, 'd':   Group  Value
8     d     17, 'g':   Group  Value
2  efgh     99, 'f':   Group  Value
2  efgh     99, 'h':   Group  Value
2  efgh     99}

It sounds like what you really want is a MultiIndex . groupby will give you unique groups--essentially what you have in your Group column, but a MultiIndex will get you closer to what it seems you want.

For example,

descriptors = 'abcdefghi'
df = pd.DataFrame([{'Value':np.random.randint(0,100), 
                       'Group':descriptors[np.random.randint(0, len(descriptors)): 
                                           np.random.randint(0, len(descriptors))]} for i in range(0,10)])

groups = df.Group.map(lambda x : tuple(desc if desc in x else '-' for desc in descriptors))
df.index = pd.MultiIndex.from_tuples(groups.values, names=list(descriptors))
df

Out[4]: 
                  Group  Value
a b c d e f g h i             
- - - - - - - - -            4
a b c - - - - - -   abc     37
- - - - e f g h -  efgh     99
a - - - - - - - -     a     67
- - - - - - - - -           37
                -           52
                -           46
  b - - - - - - -     b     41
  - - d - - - - -     d     17
      - - - - - -           36

Now, you can select data using df.xs or df.ix . For example, if you want all groups with 'a' and 'c' in them, you can use

df.xs(('a', 'c'), level=('a', 'c'))
Out[5]: 
              Group  Value
b d e f g h i             
b - - - - - -   abc     37

Similarly, you could select all groups that contain 'b'

df.xs('b', level='b')
Out[7]: 
                Group  Value
a c d e f g h i             
a c - - - - - -   abc     37
- - - - - - - -     b     41

To select non-grouped rows, you could use

df.sort_index(inplace=True) #index must be sorted 
df.ix[('-',) * len(descriptors)]
Out[10]: 
                  Group  Value
a b c d e f g h i             
- - - - - - - - -            4
                -           37
                -           52
                -           46
                -           36

Note: I've used '-' as a fill-character, but this isn't really necessary.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM