[英]Grouping data by multiple criteria in Python
I think I have a quick question, but I didn't find a way to google it in simple words.我想我有一个快速的问题,但我没有找到用简单的话谷歌搜索它的方法。
I've got a raw dataset like this:我有一个像这样的原始数据集:
Number of account Value
123 100
456 300
789 400
910 100
674 250
And I've got a methodological table to consolidate this raw data into something useful.我有一个方法表可以将这些原始数据整合成有用的东西。 It looks like:看起来像:
Variable Number of account
"a" 123, 456, 910
"b" 789,674
So, in the end I would like to get a table like this:所以,最后我想得到一张这样的桌子:
Variable Number of account
"a" Sum of values for(123, 456, 910)
"b" Sum of values for(789,674)
My initial idea is to do something like: For each row in methodological table, For each Number of account in methodological table, Sum values in raw data .我最初的想法是做这样的事情:对于方法表中的每一行,对于方法表中的每个帐户数,原始数据中的总和值。
Two questions:两个问题:
Assuming I have data in two dataframes:假设我在两个数据框中有数据:
df
is: df
是:
Number_of_account Value
123 100
456 300
789 400
910 100
674 250
and table_2
is:和table_2
是:
Variable Number_of_account
"a" 123,456,910
"b" 789,674
First, I'll create a lookup table out of table2:首先,我将从 table2 创建一个查找表:
lookup_table = pd.concat([pd.Series(row['Variable'], row['Number_of_account'].split(','))
for _, row in table_2.iterrows()]).reset_index()
lookup_table.columns = ["Number_of_account", "variable"]
lookup_table.Number_of_account = pd.to_numeric(lookup_table.Number_of_account)
The result is:结果是:
Number_of_account variable
0 123 a
1 456 a
2 910 a
3 789 b
4 674 b
Then, I merge the main dataframe ( df
) with the lookup table, and use groupby
to calculate the sum of the values.然后,我将主 dataframe ( df
) 与查找表合并,并使用groupby
计算值的总和。
df = pd.merge(df, lookup_table, on="Number_of_account")
df.groupby("variable")["Value"].sum()
The result is:结果是:
variable
a 500
b 650
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.