Pandas dataframe filter n subsequent rows after certain column value

Question

I have a pandas dataframe like this:

       Sentence #           Word     Tag
0     Sentence: 1           This       O
1             NaN             is       O
2             NaN              a       x
3             NaN           test       O
4     Sentence: 2           This       O
5             NaN             is       x
6             NaN        another       x
7             NaN           test       O
...

I would like to group it by sentence, eg return:

[['This is a test'], ['This is another test'], ...]

And also get a list of all words tagged with 'x' for every sentence, eg:

[['a'], ['is', 'another'], ...]

I have been trying to find ways to do this with no success using group(). What is the best way to solve it? Thanks

Answer 1

Solved! I used:

df = df.fillna(method='ffill')

to propagate non-null values forward. Then i computed sentences with:

df1 = df.groupby('Sentence #')['Word'].apply(lambda x: ' '.join(x))

To get tagged words:

df2 = df.groupby('Sentence #', as_index=False).apply(lambda g: g[g['Tag'] == 'x'])
df2 = df2.groupby('Sentence #')['Word'].apply(lambda x: ','.join(x))

Not sure if it's the most efficient solution though.

Pandas dataframe filter n subsequent rows after certain column value

Question

1 answers

solution1
0 2020-02-23 23:38:39

Pandas dataframe filter n subsequent rows after certain column value

Question

1 answers

solution1 0 2020-02-23 23:38:39

solution1
0 2020-02-23 23:38:39