I have a dataframe like this:
>>> df = pd.DataFrame([
... ['a1', None, 1],
... ['a2', 'a1', 2],
... ['a3', 'a2', 3],
... ['b1', None, 9],
... ['b2', 'b1', 8],
... ['b3', 'b2', 7],
... ], columns=['key', 'key_prev', 'val'])
>>> df
key key_prev val
0 a1 None 1
1 a2 a1 2
2 a3 a2 3
3 b1 None 9
4 b2 b1 8
5 b3 b2 7
Here, key
and key_prev
are chained. In the above, there are two chains:
a1 -> a2 -> a3
b1 -> b2 -> b3
I'd like to group rows by the chain they belong to. In the above example, I'd like something like:
>>> df.groupby(lambda i: df.iloc[i]['key'][0]).sum()
val
a 6
b 24
However, key
and key_prev
can be arbitrary strings, eg:
>>> df = pd.DataFrame([
... ['a', None, 1],
... ['c', 'a', 2],
... ['b', 'c', 3],
... ['p', 'b', 4],
... ['r', 'p', 5],
... ['x', None, 9],
... ['q', 'x', 8],
... ['e', 'q', 7],
... ], columns=['key', 'key_prev', 'val'])
>>> df
key key_prev val
0 a None 1
1 c a 2
2 b c 3
3 p b 4
4 r p 5
5 x None 9
6 q x 8
7 e q 7
In the above, the chains are:
a -> c -> b -> p -> r
x -> q -> e
so the above example approach of taking the first letter as a grouping criteria doesn't work.
I can manually iterate the rows and assign a group to each row, then group:
>>> km = dict()
>>> for i, r in df.iterrows():
... df.at[i, 'grp'] = km[r['key']] = km.get(r['key_prev'], r['key'])
...
>>> df.groupby('grp').sum()
val
grp
a 15
x 24
but I was wondering if there's a better approach.
EDIT: Note that the rows are not necessarily consecutive, ie groups can be intertwined, for example:
df = pd.DataFrame([
['a', None, 1], # group a
['x', None, 9], # group x
['c', 'a', 2], # group a
['q', 'x', 8], # group x
['b', 'c', 3], # group a
['p', 'b', 4], # group a
['e', 'q', 7], # group x
['r', 'p', 5], # group a
], columns=['key', 'key_prev', 'val'])
We can try use isnull
with cumsum
create the group key
out = df.groupby(df.key_prev.isnull().cumsum()).agg({'key':'first','val':'sum'})
Out[309]:
key val
key_prev
1 a 15
2 x 24
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.