[英]How to create a new column with the percentage of the occurance of a particular value in another column in a DataFrame?
I have a column, it has A value either 'Y' or 'N' for yes or no.我有一列,它有一个值'Y'或'N'代表是或否。 i want to be able to calculate the percentage of the occurance of Yes.我希望能够计算“是”发生的百分比。 and then include this as the value of a new column called "Percentage"然后将其作为一个名为“百分比”的新列的值包含在内
I have come up with this so far, Although this is what i need i dont know how to get the information in the way i describe到目前为止我已经想出了这个,虽然这是我需要的,但我不知道如何以我描述的方式获取信息
port_merge_lic_df.groupby(['Port'])['Shellfish Licence licence
(Y/N)'].value_counts(normalize=True) * 100
Port Shellfish Licence licence (Y/N)
ABERDEEN Y 80.731789
N 19.268211
AYR N 94.736842
Y 5.263158
BELFAST N 81.654676
...
STORNOWAY N 23.362692
0.383857
ULLAPOOL N 56.936826
Y 43.063174
WICK N 100.000000
Name: Shellfish Licence licence (Y/N), Length: 87, dtype: float64
The dataframe is in the form:数据框的形式为:
df1 = pd.DataFrame({'Port': {0: 'NORTH SHIELDS', 1: 'NORTH SHIELDS',
2: 'NORTH SHIELDS', 3: 'NORTH SHIELDS', 4: 'NORTH SHIELDS'},
'Shellfish Licence licence (Y/N)': {0: 'Y', 1: 'N', 2: 'N', 3: 'N', 4: 'N'},
'Scallop Licence (Y/N)': {0: 'N', 1: 'N', 2: 'N', 3: 'N', 4: 'N'},
'Length Group': {0: 'Over10m', 1: 'Over10m', 2: 'Over10m',3:
'Over10m',4: 'Over10m'}})
df1
For percentage you can compare column for Y
and aggregate mean
:对于百分比,您可以比较Y
列和聚合mean
:
out = (df1['Shellfish Licence licence (Y/N)'].eq('Y')
.groupby(df1['Port'])
.mean()
.mul(100)
.reset_index(name='meanY'))
Your solution is possible change by Series.unstack
- get both columns Y, N
:您的解决方案可能会通过Series.unstack
进行更改-获取两列Y, N
:
df2 = (df1.groupby(['Port'])['Shellfish Licence licence (Y/N)']
.value_counts(normalize=True)
.unstack() * 100)
Alternative with crosstab
:替代crosstab
:
df2 = pd.crosstab(df1['Port'], df1['Shellfish Licence licence (Y/N)'], normalize=0).mul(100)
IIUC, you can use: IIUC,您可以使用:
df1['Shellfish Licence licence (Y/N)'].eq('Y').groupby(df1['Port']).mean()
output:输出:
Port
NORTH SHIELDS 0.2
Name: Shellfish Licence licence (Y/N), dtype: float64
For all columns:对于所有列:
df1.filter(like='Y/N').apply(lambda c: c.eq('Y').groupby(df1['Port']).mean())
output:输出:
Shellfish Licence licence (Y/N) Scallop Licence (Y/N)
Port
NORTH SHIELDS 0.2 0.0
To have the data in the original dataframe:要将数据保存在原始数据框中:
df1['Shellfish percent'] = df1['Shellfish Licence licence (Y/N)'].eq('Y').groupby(df1['Port']).transform('mean')
output:输出:
Port Shellfish Licence licence (Y/N) Scallop Licence (Y/N) \
0 NORTH SHIELDS Y N
1 NORTH SHIELDS N N
2 NORTH SHIELDS N N
3 NORTH SHIELDS N N
4 NORTH SHIELDS N N
Length Group Shellfish percent
0 Over10m 0.2
1 Over10m 0.2
2 Over10m 0.2
3 Over10m 0.2
4 Over10m 0.2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.