如何根据数据框中的条件将行与前一行组合

Question

I have a dataframe where every row is a word or punctuation. 我有一个数据框，其中每一行都是单词或标点符号。 I want to iterate through the dataframe and whenever a row contains punctuation, I want to combine it with the previous row. 我想迭代数据帧，每当一行包含标点符号时，我想将它与前一行组合。

For example, I want to convert: 例如，我想转换：

  word 0 hello 1 , 2 how 3 are 4 you 5 ?

Into: 成：

  word 0 hello, 2 how 3 are 4 you?

Thanks. 谢谢。

Answer 1

`match` and `cumsum` `match`和`cumsum`

df.groupby((~df.word.str.match('\W')).cumsum(), as_index=False).sum()

     word
0  hello,
1     how
2     are
3    you?

`isin`

Also, without the as_index=True 另外，没有as_index=True

from string import punctuation

df.groupby((~df.word.isin(list(punctuation))).cumsum()).sum()

        word
word        
1     hello,
2        how
3        are
4       you?

Answer 2

You can use isin and cumsum : 您可以使用isin和cumsum ：

# list of puctuations
punctuations = set([',','?']) 

# blocks
blocks = ~df['word'].isin(punctuations)).cumsum()

# groupby
df['word'].groupby(blocks).sum()

Output: 输出：

word
1    hello,
2       how
3       are
4      you?
Name: word, dtype: object

Answer 3

yet another approach, concatenating to previous row using .shift(-1) : 另一种方法，使用.shift(-1)连接到前一行：

df.loc[df["word"].shift(-1).isin(list(punctuation)), "word"] = df["word"] + df["word"].shift(-1)
df = df[~df["word"].isin(list(punctuation))][["word"]]

df: DF：

     word
0  hello,
2     how
3     are
4    you?

如何根据数据框中的条件将行与前一行组合

问题描述

3 个解决方案

解决方案1
4 已采纳 2019-07-22 19:02:41

`match` and `cumsum` `match`和`cumsum`

`isin`

解决方案2
0 2019-07-22 19:06:15

解决方案3
0 2019-07-22 19:12:50

如何根据数据框中的条件将行与前一行组合

问题描述

3 个解决方案

解决方案1 4 已采纳 2019-07-22 19:02:41

match and cumsum match和cumsum

isin

解决方案2 0 2019-07-22 19:06:15

解决方案3 0 2019-07-22 19:12:50

解决方案1
4 已采纳 2019-07-22 19:02:41

`match` and `cumsum` `match`和`cumsum`

`isin`

解决方案2
0 2019-07-22 19:06:15

解决方案3
0 2019-07-22 19:12:50