简体   繁体   中英

Pandas dataframe filter n subsequent rows after certain column value

I have a pandas dataframe like this:

       Sentence #           Word     Tag
0     Sentence: 1           This       O
1             NaN             is       O
2             NaN              a       x
3             NaN           test       O
4     Sentence: 2           This       O
5             NaN             is       x
6             NaN        another       x
7             NaN           test       O
...

I would like to group it by sentence, eg return:

[['This is a test'], ['This is another test'], ...]

And also get a list of all words tagged with 'x' for every sentence, eg:

[['a'], ['is', 'another'], ...]

I have been trying to find ways to do this with no success using group(). What is the best way to solve it? Thanks

Solved! I used:

df = df.fillna(method='ffill')

to propagate non-null values forward. Then i computed sentences with:

df1 = df.groupby('Sentence #')['Word'].apply(lambda x: ' '.join(x))

To get tagged words:

df2 = df.groupby('Sentence #', as_index=False).apply(lambda g: g[g['Tag'] == 'x'])
df2 = df2.groupby('Sentence #')['Word'].apply(lambda x: ','.join(x))

Not sure if it's the most efficient solution though.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM