So I have a DataFrame that looks like the following:
In [5]: import pandas as pd, numpy as np
np.random.seed(seed=43525)
descriptors = 'abcdefghi'
df = pd.DataFrame([{'Value':np.random.randint(0,100),
'Group':descriptors[np.random.randint(0, len(descriptors)):
np.random.randint(0, len(descriptors))]} for i in range(0,10)])
print(df)
Group Value
0 4
1 abc 37
2 efgh 99
3 a 67
4 37
5 52
6 46
7 b 41
8 d 17
9 36
The each character in the descriptor list should become its own group (along with the null group). How could I perform a groupby
to accomplish this?
So group 'a' would contain indices 1 and 3, group 'b' would contain indices 1 and 7, etc. This is a fairly non-standard approach to using groupby (if it can be accomplish with it at all) so I'm not sure how to proceed.
Building off Edchum answer I came up with the following. The structure resembles that of a groupby
object too:
indices = {}
for val in np.unique(''.join(df.Group.values)):
indices[val] = df[df.Group.str.contains(val)]
print(indices)
Giving the following badly-formatted, but correct answer:
{'a': Group Value
1 abc 37
3 a 67, 'c': Group Value
1 abc 37, 'b': Group Value
1 abc 37
7 b 41, 'e': Group Value
2 efgh 99, 'd': Group Value
8 d 17, 'g': Group Value
2 efgh 99, 'f': Group Value
2 efgh 99, 'h': Group Value
2 efgh 99}
It sounds like what you really want is a MultiIndex
. groupby
will give you unique groups--essentially what you have in your Group
column, but a MultiIndex
will get you closer to what it seems you want.
For example,
descriptors = 'abcdefghi'
df = pd.DataFrame([{'Value':np.random.randint(0,100),
'Group':descriptors[np.random.randint(0, len(descriptors)):
np.random.randint(0, len(descriptors))]} for i in range(0,10)])
groups = df.Group.map(lambda x : tuple(desc if desc in x else '-' for desc in descriptors))
df.index = pd.MultiIndex.from_tuples(groups.values, names=list(descriptors))
df
Out[4]:
Group Value
a b c d e f g h i
- - - - - - - - - 4
a b c - - - - - - abc 37
- - - - e f g h - efgh 99
a - - - - - - - - a 67
- - - - - - - - - 37
- 52
- 46
b - - - - - - - b 41
- - d - - - - - d 17
- - - - - - 36
Now, you can select data using df.xs
or df.ix
. For example, if you want all groups with 'a' and 'c' in them, you can use
df.xs(('a', 'c'), level=('a', 'c'))
Out[5]:
Group Value
b d e f g h i
b - - - - - - abc 37
Similarly, you could select all groups that contain 'b'
df.xs('b', level='b')
Out[7]:
Group Value
a c d e f g h i
a c - - - - - - abc 37
- - - - - - - - b 41
To select non-grouped rows, you could use
df.sort_index(inplace=True) #index must be sorted
df.ix[('-',) * len(descriptors)]
Out[10]:
Group Value
a b c d e f g h i
- - - - - - - - - 4
- 37
- 52
- 46
- 36
Note: I've used '-' as a fill-character, but this isn't really necessary.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.