如何在 DataFrame 的另一列中创建具有特定值出现百分比的新列？

Question

I have a column, it has A value either 'Y' or 'N' for yes or no.我有一列，它有一个值'Y'或'N'代表是或否。 i want to be able to calculate the percentage of the occurance of Yes.我希望能够计算“是”发生的百分比。 and then include this as the value of a new column called "Percentage"然后将其作为一个名为“百分比”的新列的值包含在内

I have come up with this so far, Although this is what i need i dont know how to get the information in the way i describe到目前为止我已经想出了这个，虽然这是我需要的，但我不知道如何以我描述的方式获取信息

port_merge_lic_df.groupby(['Port'])['Shellfish Licence licence 
(Y/N)'].value_counts(normalize=True) * 100

Port       Shellfish Licence licence (Y/N)
ABERDEEN   Y                                   80.731789
           N                                   19.268211
AYR        N                                   94.736842
           Y                                    5.263158
BELFAST    N                                   81.654676
                                         ...    
STORNOWAY  N                                   23.362692
                                        0.383857
ULLAPOOL   N                                   56.936826
           Y                                   43.063174
WICK       N                                  100.000000
Name: Shellfish Licence licence (Y/N), Length: 87, dtype: float64

The dataframe is in the form:数据框的形式为：

df1 = pd.DataFrame({'Port': {0: 'NORTH SHIELDS', 1: 'NORTH SHIELDS', 
2: 'NORTH SHIELDS', 3: 'NORTH SHIELDS',  4: 'NORTH SHIELDS'},
'Shellfish Licence licence (Y/N)': {0: 'Y', 1: 'N', 2: 'N', 3: 'N', 4: 'N'},
'Scallop Licence (Y/N)': {0: 'N', 1: 'N', 2: 'N', 3: 'N', 4: 'N'},
'Length Group': {0: 'Over10m',  1: 'Over10m', 2: 'Over10m',3: 
'Over10m',4: 'Over10m'}})

df1

Answer 1

For percentage you can compare column for Y and aggregate mean :对于百分比，您可以比较Y列和聚合mean ：

out = (df1['Shellfish Licence licence (Y/N)'].eq('Y')
             .groupby(df1['Port'])
             .mean()
             .mul(100)
             .reset_index(name='meanY'))

Your solution is possible change by Series.unstack - get both columns Y, N :您的解决方案可能会通过Series.unstack进行更改-获取两列Y, N ：

df2 = (df1.groupby(['Port'])['Shellfish Licence licence (Y/N)']
         .value_counts(normalize=True)
         .unstack() * 100)

Alternative with crosstab :替代crosstab ：

df2 = pd.crosstab(df1['Port'], df1['Shellfish Licence licence (Y/N)'], normalize=0).mul(100)

Answer 2

IIUC, you can use: IIUC，您可以使用：

df1['Shellfish Licence licence (Y/N)'].eq('Y').groupby(df1['Port']).mean()

output:输出：

Port
NORTH SHIELDS    0.2
Name: Shellfish Licence licence (Y/N), dtype: float64

For all columns:对于所有列：

df1.filter(like='Y/N').apply(lambda c: c.eq('Y').groupby(df1['Port']).mean())

output:输出：

               Shellfish Licence licence (Y/N)  Scallop Licence (Y/N)
Port                                                                 
NORTH SHIELDS                              0.2                    0.0

To have the data in the original dataframe:要将数据保存在原始数据框中：

df1['Shellfish percent'] = df1['Shellfish Licence licence (Y/N)'].eq('Y').groupby(df1['Port']).transform('mean')

output:输出：

            Port Shellfish Licence licence (Y/N) Scallop Licence (Y/N)  \
0  NORTH SHIELDS                               Y                     N   
1  NORTH SHIELDS                               N                     N   
2  NORTH SHIELDS                               N                     N   
3  NORTH SHIELDS                               N                     N   
4  NORTH SHIELDS                               N                     N   

  Length Group  Shellfish percent  
0      Over10m                0.2  
1      Over10m                0.2  
2      Over10m                0.2  
3      Over10m                0.2  
4      Over10m                0.2

如何在 DataFrame 的另一列中创建具有特定值出现百分比的新列？

问题描述

2 个解决方案

解决方案1
0 2022-05-11 09:49:09

解决方案2
0 2022-05-11 09:50:10

如何在 DataFrame 的另一列中创建具有特定值出现百分比的新列？

问题描述

2 个解决方案

解决方案1 0 2022-05-11 09:49:09

解决方案2 0 2022-05-11 09:50:10

解决方案1
0 2022-05-11 09:49:09

解决方案2
0 2022-05-11 09:50:10