简体   繁体   English

根据另一列 pandas python 的值在 python 中添加新列

[英]Adding a new column in python based on the value of another column pandas python

I am trying to do a couple of simple operation with this data set.我正在尝试对这个数据集进行一些简单的操作。

在此处输入图像描述

I am trying to:我在尝试着:

  1. Calculate the total of counts attributed to each cluster.计算归因于每个集群的计数总数。 For example, for cluster 0, I would have to sum 7+4+61+7+12= 91例如,对于集群 0,我必须求和 7+4+61+7+12= 91
  2. add a new column 'total of counts' where the total of counts appears paired up with the corresponding cluster (ie rows with a value of '0' in the 'clusters' column, will have a value of 91 in the 'total of counts' column添加一个新列“总计数”,其中总计数与相应的集群配对出现(即“集群”列中值为“0”的行,“总计数”中的值为 91 ' 柱子
  3. divide column 'counts' by 'total of counts' and multiply by 100 (calculate the percentage of counts).将列“计数”除以“计数总数”并乘以 100(计算计数百分比)。 The result should be added into a new column.结果应添加到新列中。

Can someone help me to write a code for this, please?有人可以帮我为此编写代码吗?

  1. To calculate the total of counts attributed to each cluster, use this code:要计算归属于每个集群的总计数,请使用以下代码:

    total = df.groupby('clusters')['count'].sum().rename('total of counts') total = df.groupby('clusters')['count'].sum().rename('total of counts')

  2. To add a new column 'total of counts' where the total of counts appears paired up with the corresponding cluster, use this code:要添加一个新列“总计数”,其中总计数与相应的集群配对,请使用以下代码:

    df = df.join(total, on='clusters', lsuffix='') df = df.join(total, on='clusters', lsuffix='')

  3. To divide column 'counts' by 'total of counts' and multiply by 100, use this code:要将列“计数”除以“计数总数”并乘以 100,请使用以下代码:

    df['counts by total of counts'] = df['count']/df['total of counts']*100 df['计数总数'] = df['计数']/df['计数总数']*100

Assuming you've called your dataframe df , you can do the following:假设您已调用 dataframe df ,您可以执行以下操作:

point 1 use the groupby() method on the clusters column and calculate the sum using the sum() aggregation method like:第 1 点在 clusters 列上使用groupby()方法,并使用sum()聚合方法计算总和,例如:

df_grouped = df.groupby('clusters').sum()

Once done, you might want to rename the column in that dataframe to something more useful like:完成后,您可能希望将 dataframe 中的列重命名为更有用的名称,例如:

df_grouped = df_grouped.rename(columns={'count': 'cluster_count'})

point 2 To get the summed totals back into your dataframe you can merge the grouped_df with your original dataframe like:第 2 点要将总和返回到您的 dataframe 中,您可以将 grouped_df 与原始 dataframe 合并,例如:

df_merged = pd.merge(left=df, 
                     right=df_grouped, 
                     left_on='clusters', 
                     right_index=True)

Where you use the 'clusters' column is the key for your left dataframe and use the index of the df_grouped dataframe (the cluster values will be in the index there after the groupby() operation in point 1).使用“集群”列的位置是左侧 dataframe 的键,并使用 df_grouped dataframe 的索引(集群值将在第 1 点中的groupby()操作之后的索引中)。

point 3 The last step is now trivial.第 3 点最后一步现在很简单。 Just use your final dataframe and add a new column that contains the result of the required calculation:只需使用您的最终 dataframe 并添加一个包含所需计算结果的新列:

df_merged['count_pct_cluster'] = df_merged['count'] / df_merged['cluster_count'] * 100

you can do this by using this line of code will provide you with new column called total and the value of this column will be the mean of values from column 0 to 11 and here you can replace the mean value with any other operation you need您可以通过使用这行代码来执行此操作,将为您提供名为 total 的新列,该列的值将是第 0 到 11 列的值的平均值,在这里您可以用您需要的任何其他操作替换平均值

 df['total'] = df.iloc[:,:12].mean()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM