[英]How can I combine chronologically consecutive rows based on a condition in pandas?
[英]Pandas combine consecutive rows based on condition
我的问题与这个问题类似,但答案似乎并不完全有效!!
鉴于以下熊猫数据框:
+---------+-----------------+-----------------+
| SECTION | TEXT | NUMBER_OF_WORDS |
+---------+-----------------+-----------------+
| ONE | lots of text… | 55 |
+---------+-----------------+-----------------+
| ONE | word1 | 1 |
+---------+-----------------+-----------------+
| ONE | lots of text… | 151 |
+---------+-----------------+-----------------+
| ONE | word2 | 1 |
+---------+-----------------+-----------------+
| ONE | word3 | 1 |
+---------+-----------------+-----------------+
| ONE | word4 | 1 |
+---------+-----------------+-----------------+
| TWO | lots of text… | 523 |
+---------+-----------------+-----------------+
| TWO | lots of text… | 123 |
+---------+-----------------+-----------------+
| TWO | word4 | 1 |
+---------+-----------------+-----------------+
如果 NUMBER_OF_WORDS 列中的值为 1; 它必须与上一行结合; 只要它们具有相同的 SECTION 值。
因此最终的结果应该是这样的:
+---------+--------------------------------------+-----------------+
| SECTION | TEXT | NUMBER_OF_WORDS |
+---------+--------------------------------------+-----------------+
| ONE | lots of text…, word1 | 56 |
+---------+--------------------------------------+-----------------+
| ONE | lots of text…, word2, word3, word4 | 154 |
+---------+--------------------------------------+-----------------+
| TWO | lots of text… | 523 |
+---------+--------------------------------------+-----------------+
| TWO | lots of text…, word4 | 124 |
+---------+--------------------------------------+-----------------+
这是代码; 这似乎有效,但不是我想要的。
df.groupby(['SECTION', (df.NUMBER_OF_WORDS.shift(1) == 1)], as_index=False, sort=False).agg({'TEXT': lambda x: ', '.join(x), 'NUMBER_OF_WORDS': lambda x: sum(x)})
更新
这是来自 BEN_YO 的回答; 但他似乎有一个小错字。 为了让未来的用户回答这个问题,我将把他的答案稍微修改一下。
s = df['NUMBER_OF_WORDS'].ne(1).cumsum()
out = df.groupby(s).agg({'SECTION': 'first','TEXT': lambda x: ', '.join(x),'NUMBER_OF_WORDS': lambda x: sum(x)})
让我们用cumsum
试试groupby
s = df['NUMBER_OF_WORDS'].ne(1).cumsum()
out = df.groupby(s).agg({'SECTION':'first','TEXT':','.join,'NUMBER_OF_WORDS':'sum'})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.