Pandas groupby on text：获取每组多个句子的句子编号

Question

My dataframe looks like this:我的数据框如下所示：

    id      sentence                                            ind
    747     A simple and convenient colorimetric method is...   NaN
    747     A simple and convenient colorimetric method is...   NaN
    747     A simple and convenient colorimetric method is...   ulcerative 
    749     Of special significance was the increased acti...   NaN
    749     Of special significance was the increased acti...   NaN
    749     Of special significance was the increased acti...   head injuries
    749     Of special significance was the increased acti...   NaN
    858     Some patients with acute viral hepatitis or pr...   acute viral 
    858     Some patients with acute viral hepatitis or pr...   NaN
    858     Some patients with acute viral hepatitis or pr...   NaN
    948     The other ALP isozyme of FL cells had properti...   NaN
    948     The other ALP isozyme of FL cells had properti...   NaN
    948     The other ALP isozyme of FL cells had properti...   NaN
    948     It was found that a human hepatoma-associated ...   NaN
    948     It was found that a human hepatoma-associated ...   hepatoma
    948     It was found that a human hepatoma-associated ...   NaN
    948     It was more heat stable and more sensitive to ...   virus
    948     It was more heat stable and more sensitive to ...   NaN
    948     It was more heat stable and more sensitive to ...   NaN

I'm using df.groupby(['id', 'sentence']).first().head(20) and I get this:我正在使用df.groupby(['id', 'sentence']).first().head(20) ，我得到了这个：

pmid    sentence                                            ind
747     A simple and convenient colorimetric method is...   NaN
749     Of special significance was the increased acti...   NaN
858     Some patients with acute viral hepatitis or pr...   acute viral 
948      It was found that a human hepatoma-associated...   hepatoma
         It was more heat stable and more sensitive to...   virus

As we see, for id=948 , there are more than one (id-sentence) pairs.如我们所见，对于id=948 ，有不止一对（id-sentence）对。

My question is : Is there a way to get a sentence number for every id in my dataframe, since I have more than one (id-sentence) pairs for one id?我的问题是：有没有办法为我的数据框中的每个 id 获取句子编号，因为我有多个（id-sentence）对用于一个 id？

For example, to have something like:例如，有类似的东西：

id   sentence_nr   sentence                                           ind
747  01            A simple and convenient colorimetric method is...  NaN
749  01            Of special significance was the increased acti...  NaN
858  01            Some patients with acute viral hepatitis or pr...  acute viral 
948  01            It was found that a human hepatoma-associated ...  hepatoma 
948  02            It was more heat stable and more sensitive to ...  virus

Answer 1

You could use GroupBy.cumcount :您可以使用GroupBy.cumcount ：

df_grouped = df.groupby(['id', 'sentence'], as_index=False).first()
df_grouped['sentence_nr'] = df_grouped.groupby(df_grouped['id']).cumcount() + 1

print(df_grouped)

    id                                           sentence            ind  sentence_nr
0  747  A simple and convenient colorimetric method is...     ulcerative            1
1  749  Of special significance was the increased acti...  head injuries            1
2  858  Some patients with acute viral hepatitis or pr...    acute viral            1
3  948  It was found that a human hepatoma-associated ...       hepatoma            1
4  948  It was more heat stable and more sensitive to ...          virus            2
5  948  The other ALP isozyme of FL cells had properti...           None            3

Pandas groupby on text：获取每组多个句子的句子编号

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-05-19 19:29:33

Pandas groupby on text：获取每组多个句子的句子编号

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-05-19 19:29:33

解决方案1
1 已采纳 2022-05-19 19:29:33