简体   繁体   中英

How to combine row with previous row based on condition in dataframe

I have a dataframe where every row is a word or punctuation. I want to iterate through the dataframe and whenever a row contains punctuation, I want to combine it with the previous row.

For example, I want to convert:

  word 0 hello 1 , 2 how 3 are 4 you 5 ? 

Into:

  word 0 hello, 2 how 3 are 4 you? 

Thanks.

match and cumsum

df.groupby((~df.word.str.match('\W')).cumsum(), as_index=False).sum()

     word
0  hello,
1     how
2     are
3    you?

isin

Also, without the as_index=True

from string import punctuation

df.groupby((~df.word.isin(list(punctuation))).cumsum()).sum()

        word
word        
1     hello,
2        how
3        are
4       you?

You can use isin and cumsum :

# list of puctuations
punctuations = set([',','?']) 

# blocks
blocks = ~df['word'].isin(punctuations)).cumsum()

# groupby
df['word'].groupby(blocks).sum()

Output:

word
1    hello,
2       how
3       are
4      you?
Name: word, dtype: object

yet another approach, concatenating to previous row using .shift(-1) :

df.loc[df["word"].shift(-1).isin(list(punctuation)), "word"] = df["word"] + df["word"].shift(-1)
df = df[~df["word"].isin(list(punctuation))][["word"]]

df:

     word
0  hello,
2     how
3     are
4    you?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM