简体   繁体   English

如何计算 excel 中条件单元格的总和,用结果填充另一列

[英]How to calculate the sum of conditional cells in excel, populate another column with results

EDIT: Using advanced search in Excel (under data tab) I have been able to create a list of unique company names, and am now able to SUMIF based on the cell containing the companies name!编辑:在 Excel 中使用高级搜索(在数据选项卡下)我已经能够创建唯一公司名称的列表,现在可以根据包含公司名称的单元格进行 SUMIF !

Disclaimer: Any python solutions would be greatly appreciated as well, pandas specifically!免责声明:任何 python 解决方案也将不胜感激,特别是 pandas!

I have 60,000 rows of data, containing information about grants awarded to companies.我有 60,000 行数据,其中包含有关授予公司的赠款的信息。

在此处输入图像描述

I am planning on creating a python dictionary to store each unique company name, with their total grant $ given (agreemen_2), and location coordinates.我正计划创建一个 python 字典来存储每个唯一的公司名称,以及给定的总授权美元(agreemen_2)和位置坐标。 Then, I want to display this using Dash (Plotly) on a live MapBox map of Canada.然后,我想在加拿大的实时 MapBox map 上使用 Dash (Plotly) 显示此内容。

First thing first, how do I calculate and store the total value that was awarded to each company?首先,我如何计算和存储授予每家公司的总价值?

I have seen SUMIF in other solutions, but am unsure how to output this to a new column, if that makes sense.我在其他解决方案中看到过 SUMIF,但如果有意义的话,我不确定如何将 output 放到一个新列中。

One potential solution I thought was to create a new column of unique company names, and next to it SUMIF all the appropriate cells in col D.我认为一个潜在的解决方案是创建一个新的唯一公司名称列,然后在它旁边 SUMIF 列 D 中的所有适当单元格。

PYTHON STUFF SO FAR PYTHON 东西到目前为止

So with the below code, I take a much messier looking spreadsheet, drop duplicates, sort based on company name, and create a new pandas database with the relevant data columns:因此,使用下面的代码,我使用了一个看起来更混乱的电子表格,删除重复项,根据公司名称进行排序,并创建一个新的 pandas 数据库,其中包含相关的数据列:

corp_df is the cleaned up new dataframe that I want to work with. corp_df 是我想使用的清理后的新 dataframe。

and recipien_4 is the companies unique ID number, as you can see it repeats with each grant awarded. recipien_4 是公司的唯一 ID 号,如您所见,它会随着每次授予的赠款而重复。 Folia Biotech in the screenshot shows a duplicate grant, as proven with a column i did not include in the screenshot.屏幕截图中的 Folia Biotech 显示了重复的赠款,正如我在屏幕截图中未包含的一列所证明的那样。 There are quite a few duplicates, as seen in the screenshot.如屏幕截图所示,有很多重复项。

import pandas as pd

in_file = '2019-20 Grants and Contributions.csv'

# create dataframe 
df = pd.read_csv(in_file)

# sort in order of agreemen_1
df.sort_values("recipien_2", inplace = True)

# remove duplicates
df.drop_duplicates(subset='agreemen_1', keep='first', inplace=True)

corp_dict = { }

# creates empty dict with only 1 copy of all corporation names, all values of 0
for name in corp_df_2['recipien_2']:
    if name not in corp_dict:
        corp_dict[name] = 0

# full name, id, grant $, longitude, latitude
corp_df = df[['recipien_2', 'recipien_4', 'agreemen_2','longitude','latitude']]

any tips or tricks would be greatly appreciated, .ittertuples() didn't seem like a good solution as I am unsure how to filter and compare data, or if datatypes are preserved.任何提示或技巧将不胜感激, .ittertuples() 似乎不是一个好的解决方案,因为我不确定如何过滤和比较数据,或者是否保留数据类型。 But feel free to prove me wrong haha.但是请随时证明我错了哈哈。

I thought perhaps there was a better way to tackle this problem, straight in Excel vs. iterating through rows of a pandas dataframe.我想也许有更好的方法来解决这个问题,直接在 Excel 中而不是在 pandas dataframe 的行中迭代。 This is a pretty open question so thank you for any help or direction you think is best!这是一个非常开放的问题,因此感谢您提供您认为最好的任何帮助或指导!

The use of group_by followed by a sum may be the best for you:使用group_by后跟sum可能最适合您:

corp_df= df.group_by(by=['recipien_2', 'longitude','latitude']).apply(sum, axis=1)

#if you want to transform the index into columns you can add this after as well:
corp_df=corp_df.reset_index()

I can see that you are using pandas to read de the file csv, so you can use the method:我可以看到您正在使用pandas来读取文件csv,因此您可以使用方法:

Group by

So you can create a new dataframe making groupings for the name of the company like this:因此,您可以创建一个新的 dataframe 为公司名称进行分组,如下所示:

dfnew = dp.groupby(['recipien_2','agreemen_2']).sum()

Then dfnew have the values.然后dfnew有值。

Documentation Pandas Group by: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html Documentation Pandas Group by: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM