简体   繁体   English

如何在 DataFrame 的另一列中创建具有特定值出现百分比的新列?

[英]How to create a new column with the percentage of the occurance of a particular value in another column in a DataFrame?

I have a column, it has A value either 'Y' or 'N' for yes or no.我有一列,它有一个值'Y'或'N'代表是或否。 i want to be able to calculate the percentage of the occurance of Yes.我希望能够计算“是”发生的百分比。 and then include this as the value of a new column called "Percentage"然后将其作为一个名为“百分比”的新列的值包含在内

I have come up with this so far, Although this is what i need i dont know how to get the information in the way i describe到目前为止我已经想出了这个,虽然这是我需要的,但我不知道如何以我描述的方式获取信息

port_merge_lic_df.groupby(['Port'])['Shellfish Licence licence 
(Y/N)'].value_counts(normalize=True) * 100

Port       Shellfish Licence licence (Y/N)
ABERDEEN   Y                                   80.731789
           N                                   19.268211
AYR        N                                   94.736842
           Y                                    5.263158
BELFAST    N                                   81.654676
                                         ...    
STORNOWAY  N                                   23.362692
                                        0.383857
ULLAPOOL   N                                   56.936826
           Y                                   43.063174
WICK       N                                  100.000000
Name: Shellfish Licence licence (Y/N), Length: 87, dtype: float64

The dataframe is in the form:数据框的形式为:

df1 = pd.DataFrame({'Port': {0: 'NORTH SHIELDS', 1: 'NORTH SHIELDS', 
2: 'NORTH SHIELDS', 3: 'NORTH SHIELDS',  4: 'NORTH SHIELDS'},
'Shellfish Licence licence (Y/N)': {0: 'Y', 1: 'N', 2: 'N', 3: 'N', 4: 'N'},
'Scallop Licence (Y/N)': {0: 'N', 1: 'N', 2: 'N', 3: 'N', 4: 'N'},
'Length Group': {0: 'Over10m',  1: 'Over10m', 2: 'Over10m',3: 
'Over10m',4: 'Over10m'}})

df1

For percentage you can compare column for Y and aggregate mean :对于百分比,您可以比较Y列和聚合mean

out = (df1['Shellfish Licence licence (Y/N)'].eq('Y')
             .groupby(df1['Port'])
             .mean()
             .mul(100)
             .reset_index(name='meanY'))

Your solution is possible change by Series.unstack - get both columns Y, N :您的解决方案可能会通过Series.unstack进行更改-获取两列Y, N

df2 = (df1.groupby(['Port'])['Shellfish Licence licence (Y/N)']
         .value_counts(normalize=True)
         .unstack() * 100)

Alternative with crosstab :替代crosstab

df2 = pd.crosstab(df1['Port'], df1['Shellfish Licence licence (Y/N)'], normalize=0).mul(100)

IIUC, you can use: IIUC,您可以使用:

df1['Shellfish Licence licence (Y/N)'].eq('Y').groupby(df1['Port']).mean()

output:输出:

Port
NORTH SHIELDS    0.2
Name: Shellfish Licence licence (Y/N), dtype: float64

For all columns:对于所有列:

df1.filter(like='Y/N').apply(lambda c: c.eq('Y').groupby(df1['Port']).mean())

output:输出:

               Shellfish Licence licence (Y/N)  Scallop Licence (Y/N)
Port                                                                 
NORTH SHIELDS                              0.2                    0.0

To have the data in the original dataframe:要将数据保存在原始数据框中:

df1['Shellfish percent'] = df1['Shellfish Licence licence (Y/N)'].eq('Y').groupby(df1['Port']).transform('mean')

output:输出:

            Port Shellfish Licence licence (Y/N) Scallop Licence (Y/N)  \
0  NORTH SHIELDS                               Y                     N   
1  NORTH SHIELDS                               N                     N   
2  NORTH SHIELDS                               N                     N   
3  NORTH SHIELDS                               N                     N   
4  NORTH SHIELDS                               N                     N   

  Length Group  Shellfish percent  
0      Over10m                0.2  
1      Over10m                0.2  
2      Over10m                0.2  
3      Over10m                0.2  
4      Over10m                0.2  

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 DataFrame 中另一列中特定值的出现百分比创建一个新列 - How to create a new column with the percentage of the occurance of a particular value in another column in a DataFrame 熊猫,如何计算分组数据框中的出现次数并创建新列? - Pandas, how to count the occurance within grouped dataframe and create new column? 如何在 pandas dataframe 中的特定行创建一个新列并插入值? - How to create a new column and insert value at a particular row in pandas dataframe? 如何基于另一个DataFrame中的列在Pandas DataFrame中创建新列? - How to create a new column in a Pandas DataFrame based on a column in another DataFrame? 如何在数据框中使用该数据框中另一列的值创建一个新列,但在 1 小时内 - How to create a new column in a dataframe with the value of another column in that dataframe, but within 1 hour 创建新的数据框列,保留另一列的第一个值 - Create new dataframe column keeping the first value from another column 根据另一个 dataframe 的值创建新列 dataframe 运行速度快吗? - create new column of dataframe base on value of another dataframe run fast? 如何用另一个值替换 Pyspark Dataframe 列中的特定值? - How to replace a particular value in a Pyspark Dataframe column with another value? 根据另一个列值计算一个值在 dataframe 列中出现的百分比 - Calculate percentage of occurences of a value in a dataframe column based on another column value 如何使用来自满足条件的另一列的解析值在 dataframe 中创建新列 - How to create a new column in a dataframe with a parsed value from another column where condition is satisfied
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM