![](/img/trans.png)
[英]How to get previous row with condition in a DataFrame of Pandas
[英]How to combine row with previous row based on condition in dataframe
我有一个数据框,其中每一行都是单词或标点符号。 我想迭代数据帧,每当一行包含标点符号时,我想将它与前一行组合。
例如,我想转换:
word 0 hello 1 , 2 how 3 are 4 you 5 ?
成:
word 0 hello, 2 how 3 are 4 you?
谢谢。
match
和cumsum
df.groupby((~df.word.str.match('\W')).cumsum(), as_index=False).sum()
word
0 hello,
1 how
2 are
3 you?
isin
另外,没有as_index=True
from string import punctuation
df.groupby((~df.word.isin(list(punctuation))).cumsum()).sum()
word
word
1 hello,
2 how
3 are
4 you?
您可以使用isin
和cumsum
:
# list of puctuations
punctuations = set([',','?'])
# blocks
blocks = ~df['word'].isin(punctuations)).cumsum()
# groupby
df['word'].groupby(blocks).sum()
输出:
word
1 hello,
2 how
3 are
4 you?
Name: word, dtype: object
另一种方法,使用.shift(-1)
连接到前一行:
df.loc[df["word"].shift(-1).isin(list(punctuation)), "word"] = df["word"] + df["word"].shift(-1)
df = df[~df["word"].isin(list(punctuation))][["word"]]
DF:
word
0 hello,
2 how
3 are
4 you?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.