I have a pandas dataframe like this:
Sentence # Word Tag
0 Sentence: 1 This O
1 NaN is O
2 NaN a x
3 NaN test O
4 Sentence: 2 This O
5 NaN is x
6 NaN another x
7 NaN test O
...
I would like to group it by sentence, eg return:
[['This is a test'], ['This is another test'], ...]
And also get a list of all words tagged with 'x' for every sentence, eg:
[['a'], ['is', 'another'], ...]
I have been trying to find ways to do this with no success using group(). What is the best way to solve it? Thanks
Solved! I used:
df = df.fillna(method='ffill')
to propagate non-null values forward. Then i computed sentences with:
df1 = df.groupby('Sentence #')['Word'].apply(lambda x: ' '.join(x))
To get tagged words:
df2 = df.groupby('Sentence #', as_index=False).apply(lambda g: g[g['Tag'] == 'x'])
df2 = df2.groupby('Sentence #')['Word'].apply(lambda x: ','.join(x))
Not sure if it's the most efficient solution though.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.