[英]Pandas groupby on text : get sentence numbering for multiple sentences per group
My dataframe looks like this:我的数据框如下所示:
id sentence ind
747 A simple and convenient colorimetric method is... NaN
747 A simple and convenient colorimetric method is... NaN
747 A simple and convenient colorimetric method is... ulcerative
749 Of special significance was the increased acti... NaN
749 Of special significance was the increased acti... NaN
749 Of special significance was the increased acti... head injuries
749 Of special significance was the increased acti... NaN
858 Some patients with acute viral hepatitis or pr... acute viral
858 Some patients with acute viral hepatitis or pr... NaN
858 Some patients with acute viral hepatitis or pr... NaN
948 The other ALP isozyme of FL cells had properti... NaN
948 The other ALP isozyme of FL cells had properti... NaN
948 The other ALP isozyme of FL cells had properti... NaN
948 It was found that a human hepatoma-associated ... NaN
948 It was found that a human hepatoma-associated ... hepatoma
948 It was found that a human hepatoma-associated ... NaN
948 It was more heat stable and more sensitive to ... virus
948 It was more heat stable and more sensitive to ... NaN
948 It was more heat stable and more sensitive to ... NaN
I'm using df.groupby(['id', 'sentence']).first().head(20)
and I get this:我正在使用
df.groupby(['id', 'sentence']).first().head(20)
,我得到了这个:
pmid sentence ind
747 A simple and convenient colorimetric method is... NaN
749 Of special significance was the increased acti... NaN
858 Some patients with acute viral hepatitis or pr... acute viral
948 It was found that a human hepatoma-associated... hepatoma
It was more heat stable and more sensitive to... virus
As we see, for id=948
, there are more than one (id-sentence) pairs.如我们所见,对于
id=948
,有不止一对(id-sentence)对。
My question is : Is there a way to get a sentence number for every id in my dataframe, since I have more than one (id-sentence) pairs for one id?我的问题是:有没有办法为我的数据框中的每个 id 获取句子编号,因为我有多个(id-sentence)对用于一个 id?
For example, to have something like:例如,有类似的东西:
id sentence_nr sentence ind
747 01 A simple and convenient colorimetric method is... NaN
749 01 Of special significance was the increased acti... NaN
858 01 Some patients with acute viral hepatitis or pr... acute viral
948 01 It was found that a human hepatoma-associated ... hepatoma
948 02 It was more heat stable and more sensitive to ... virus
You could use GroupBy.cumcount
:您可以使用
GroupBy.cumcount
:
df_grouped = df.groupby(['id', 'sentence'], as_index=False).first()
df_grouped['sentence_nr'] = df_grouped.groupby(df_grouped['id']).cumcount() + 1
print(df_grouped)
id sentence ind sentence_nr
0 747 A simple and convenient colorimetric method is... ulcerative 1
1 749 Of special significance was the increased acti... head injuries 1
2 858 Some patients with acute viral hepatitis or pr... acute viral 1
3 948 It was found that a human hepatoma-associated ... hepatoma 1
4 948 It was more heat stable and more sensitive to ... virus 2
5 948 The other ALP isozyme of FL cells had properti... None 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.