在 Python 中按多个条件对数据进行分组

Question

I think I have a quick question, but I didn't find a way to google it in simple words.我想我有一个快速的问题，但我没有找到用简单的话谷歌搜索它的方法。

I've got a raw dataset like this:我有一个像这样的原始数据集：

 Number of account     Value
      123               100
      456               300
      789               400
      910               100
      674               250

And I've got a methodological table to consolidate this raw data into something useful.我有一个方法表可以将这些原始数据整合成有用的东西。 It looks like:看起来像：

 Variable              Number of account
    "a"                  123, 456, 910
    "b"                    789,674

So, in the end I would like to get a table like this:所以，最后我想得到一张这样的桌子：

 Variable              Number of account
    "a"                  Sum of values for(123, 456, 910)
    "b"                  Sum of values for(789,674)

My initial idea is to do something like: For each row in methodological table, For each Number of account in methodological table, Sum values in raw data .我最初的想法是做这样的事情：对于方法表中的每一行，对于方法表中的每个帐户数，原始数据中的总和值。

Two questions:两个问题：

What is the best way to consolidate it?巩固它的最佳方法是什么？
What if in methodological table number of accounts are comma-delimited strings?如果方法表中的账户数是逗号分隔的字符串怎么办？ ("123,456,910"). （“123,456,910”）。 Can I store multiple numbers in one cell in pandas DataFrame我可以在 pandas DataFrame 的一个单元格中存储多个数字吗

Answer 1

Assuming I have data in two dataframes:假设我在两个数据框中有数据：

df is: df是：

Number_of_account     Value
      123               100
      456               300
      789               400
      910               100
      674               250

and table_2 is:和table_2是：

Variable              Number_of_account
    "a"                  123,456,910
    "b"                    789,674

First, I'll create a lookup table out of table2:首先，我将从 table2 创建一个查找表：

lookup_table = pd.concat([pd.Series(row['Variable'], row['Number_of_account'].split(','))              
                         for _, row in table_2.iterrows()]).reset_index()
lookup_table.columns = ["Number_of_account", "variable"]
lookup_table.Number_of_account = pd.to_numeric(lookup_table.Number_of_account)

The result is:结果是：

   Number_of_account variable
0                123        a
1                456        a
2                910        a
3                789        b
4                674        b

Then, I merge the main dataframe ( df ) with the lookup table, and use groupby to calculate the sum of the values.然后，我将主 dataframe ( df ) 与查找表合并，并使用groupby计算值的总和。

df = pd.merge(df, lookup_table, on="Number_of_account")
df.groupby("variable")["Value"].sum()

The result is:结果是：

variable
a    500
b    650

在 Python 中按多个条件对数据进行分组

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-05-24 09:49:56

在 Python 中按多个条件对数据进行分组

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-05-24 09:49:56

解决方案1
1 已采纳 2020-05-24 09:49:56