[英]Create new column by counting distinct values in another column in pandas
Hello I have a dataframe such as:您好,我有一个数据框,例如:
COL1_1 COL1_3 COL2
Chr1_0 Canis_lupus A
Chr1_0 Canis_lupus A
Chr1_0 Canis_lupus B
Chr1_0 Canis_lupus B
Chr1_0 Canis_lupus B
Chr1_0 Felis_cattus B
Chr1_0 Felis_cattus B
Chr2_0 Felis_cattus A
Chr2_0 Felis_cattus B
Chr2_1 Felis_cattus C
Chr2_1 Felis_cattus D
Chr2_1 Felis_cattus E
and the idea is within each COL1_1
and COL1_3
count the number of distinct COL2
.并且这个想法是在每个COL1_1
和COL1_3
计算不同COL2
的数量。
ex : for Chr1_0
and Canis_lupus
there are 2 distinct COL2
(A and B), so I put 2 into the new COL3
.例如:对于Chr1_0
和Canis_lupus
,有 2 个不同的COL2
(A 和 B),所以我将 2 个放入新的COL3
。
if there is only one value, I put a 0.如果只有一个值,我放一个 0。
here I should then get在这里我应该得到
COL1_1 COL1_3 COL2 COL3
Chr1_0 Canis_lupus A 2
Chr1_0 Canis_lupus A 2
Chr1_0 Canis_lupus B 2
Chr1_0 Canis_lupus B 2
Chr1_0 Canis_lupus B 2
Chr1_0 Felis_cattus B 0
Chr1_0 Felis_cattus B 0
Chr2_0 Felis_cattus A 2
Chr2_0 Felis_cattus B 2
Chr2_1 Felis_cattus C 3
Chr2_1 Felis_cattus D 3
Chr2_1 Felis_cattus E 3
maybe an idea would be to groupby (COL1_1 and
COL1_3`) and count number of distinct COL2 values.也许一个想法是分组(COL1_1 and
COL1_3`)并计算不同 COL2 值的数量。
Use GroupBy.transform
with DataFrameGroupBy.nunique
and Series.mask
for replace 1
to 0
:使用GroupBy.transform
与DataFrameGroupBy.nunique
和Series.mask
替换1
到0
:
df['COL3'] = (df.groupby(['COL1_1', 'COL1_3']).COL2.transform('nunique')
.mask(lambda x: x == 1, 0))
Or use replace
:或使用replace
:
df['COL3'] = df.groupby(['COL1_1', 'COL1_3']).COL2.transform('nunique').replace({1:0})
print (df)
COL1_1 COL1_3 COL2 COL3
0 Chr1_0 Canis_lupus A 2
1 Chr1_0 Canis_lupus A 2
2 Chr1_0 Canis_lupus B 2
3 Chr1_0 Canis_lupus B 2
4 Chr1_0 Canis_lupus B 2
5 Chr1_0 Felis_cattus B 0
6 Chr1_0 Felis_cattus B 0
7 Chr2_0 Felis_cattus A 2
8 Chr2_0 Felis_cattus B 2
9 Chr2_1 Felis_cattus C 3
10 Chr2_1 Felis_cattus D 3
11 Chr2_1 Felis_cattus E 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.