简体   繁体   English

如何根据数据框中的条件将行与前一行组合

[英]How to combine row with previous row based on condition in dataframe

I have a dataframe where every row is a word or punctuation. 我有一个数据框,其中每一行都是单词或标点符号。 I want to iterate through the dataframe and whenever a row contains punctuation, I want to combine it with the previous row. 我想迭代数据帧,每当一行包含标点符号时,我想将它与前一行组合。

For example, I want to convert: 例如,我想转换:

  word 0 hello 1 , 2 how 3 are 4 you 5 ? 

Into: 成:

  word 0 hello, 2 how 3 are 4 you? 

Thanks. 谢谢。

match and cumsum matchcumsum

df.groupby((~df.word.str.match('\W')).cumsum(), as_index=False).sum()

     word
0  hello,
1     how
2     are
3    you?

isin

Also, without the as_index=True 另外,没有as_index=True

from string import punctuation

df.groupby((~df.word.isin(list(punctuation))).cumsum()).sum()

        word
word        
1     hello,
2        how
3        are
4       you?

You can use isin and cumsum : 您可以使用isincumsum

# list of puctuations
punctuations = set([',','?']) 

# blocks
blocks = ~df['word'].isin(punctuations)).cumsum()

# groupby
df['word'].groupby(blocks).sum()

Output: 输出:

word
1    hello,
2       how
3       are
4      you?
Name: word, dtype: object

yet another approach, concatenating to previous row using .shift(-1) : 另一种方法,使用.shift(-1)连接到前一行:

df.loc[df["word"].shift(-1).isin(list(punctuation)), "word"] = df["word"] + df["word"].shift(-1)
df = df[~df["word"].isin(list(punctuation))][["word"]]

df: DF:

     word
0  hello,
2     how
3     are
4    you?

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Pandas 的 DataFrame 中获取带有条件的前一行 - How to get previous row with condition in a DataFrame of Pandas 如何根据 pandas DataFrame 中的条件从当前行值中减去前一行值? - how to subtract previous row value from current row value based on condition in pandas DataFrame? 如何根据前一行合并数据框中的行? - How to merge rows in a Dataframe based on a previous row? Pandas Dataframe根据条件通过上一个更新行值 - Pandas Dataframe update the row values by previous one based on condition 如何根据 python 中的条件将一行中的值替换为上一行? - How to replace a value in a row with the previous row based on a condition in python? 如何根据一行是否包含另一行中的值组合数据框中的行 - How to combine rows in dataframe based on if a row contains a value in another row 如何根据条件在数据框中移动一行 - How to move a row in a dataframe based on a condition 如何基于当前行的条件获取Pandas GroupedBy Dataframe的前一行? - How to get previous rows of a pandas GroupedBy Dataframe based on a condition on the current row? 如何根据条件和另一行的值将 function 应用于 dataframe 行? - How to apply a function to a dataframe row based on a condition and values of another row? 根据前一行修改DataFrame(累积总和,条件基于前一个累积总和结果) - Modify DataFrame based on previous row (cumulative sum with condition based on previous cumulative sum result)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM