简体   繁体   English

我可以 pivot 基于元组列表的 pandas dataframe 中的一列吗?

[英]Can I pivot a column in a pandas dataframe based on a list of tuples?

I am trying to change the structure of my dataframe as follows.我正在尝试按如下方式更改我的 dataframe 的结构。 I have a dataset with historical KPI information: each record contains a date, the KPI ID, multiple dimensions and the KPI value.我有一个包含历史 KPI 信息的数据集:每条记录都包含日期、KPI ID、多个维度和 KPI 值。

Based on a list of 3-tuples, I want to transform this dataframe such that the end result is a combination of 2 records in the existing dataframe with a numerator and a denominator, each coming from an individual record with the same date/dimensions.基于 3 元组列表,我想转换此 dataframe,以便最终结果是现有 dataframe 中的 2 条记录与分子和分母的组合,每条记录都来自具有相同日期/维度的单独记录。

Current dataframe:当前dataframe:

Date  | KPI_ID | Dimension | Value
Apr 5 | KPI_1  | Lorem     | 1
Apr 5 | KPI_2  | Lorem     | 3
Apr 5 | KPI_1  | Ipsum     | 4
Apr 5 | KPI_2  | Ipsum     | 8
Apr 5 | KPI_3  | Dolor     | 2
Apr 5 | KPI_4  | Dolor     | 2

List of 3-tuples giving the combinations of KPI_IDs such as [Result_ID, KPI_Numerator, KPI_Denominator]:给出 KPI_ID 组合的三元组列表,例如 [Result_ID、KPI_Numerator、KPI_Denominator]:

[['Result_1', 'KPI_1', 'KPI_2'], ['Result_2', 'KPI_3', 'KPI_4']]

Desired result:期望的结果:

Date  | Result_ID | Dimension | Numerator | Denominator
Apr 5 | Result_1  | Lorem     | 1         | 3
Apr 5 | Result_1  | Ipsum     | 4         | 8
Apr 5 | Result_2  | Dolor     | 2         | 2

I have tried to use df.merge and df.groupby with an aggregation function and but am struggling to understand how I can best integrate the list of tuples in the equation.我曾尝试将 df.merge 和 df.groupby 与聚合 function 一起使用,但我正在努力理解如何最好地整合等式中的元组列表。 Looping over the dataframe does not seem to be the answer because I would have to manually look for the record with exactly the same dimensions which I do not think is performant.遍历 dataframe 似乎不是答案,因为我必须手动查找具有完全相同维度的记录,我认为这是不高效的。

You can create a dataframe with combinations (Result_ID, KPI_1, KPI_2), then merge it twice to the original dataframe based on KPI_1 first, and then on KPI_2 (this time also matching on Time and Dimension):可以用组合(Result_ID,KPI_1,KPI_2)创建一个dataframe,然后两次合并到原来的dataframe,先是根据KPI_1,然后是KPI_2(这次也匹配Time和Dimension):

# Create combinations dataframe
cs = [['Result_1', 'KPI_1', 'KPI_2'], ['Result_2', 'KPI_3', 'KPI_4']]
df_cs = pd.DataFrame(cs, columns=['Result_ID', 'KPI_1', 'KPI_2'])

# Merge combinations dataframe to original data:
# 1. So that 'KPI_1' in combinations = 'KPI_ID' in data
# 2. So that 'KPI_2' in combinations = 'KPI_ID' in data,
#    and we get a match on ['Date', 'Dimension']
cols = ['Date', 'Result_ID', 'Dimension', 'Numerator', 'Denominator']
df_out = (df_cs
    .merge(df.rename(columns={'Value': 'Numerator'}),
           left_on='KPI_1', right_on='KPI_ID')
    .drop(columns='KPI_ID')
    .merge(df.rename(columns={'Value': 'Denominator'}),
           left_on=['Date', 'Dimension', 'KPI_2'],
           right_on=['Date', 'Dimension', 'KPI_ID'])
    .drop(columns=['KPI_ID', 'KPI_1', 'KPI_2'])
)[cols]

Output: Output:

    Date Result_ID Dimension  Numerator  Denominator
0  Apr 5  Result_1     Lorem          1            3
1  Apr 5  Result_1     Ipsum          4            8
2  Apr 5  Result_2     Dolor          2            2

I am trying to change the structure of my dataframe as follows.我试图改变我的 dataframe 的结构如下。 I have a dataset with historical KPI information: each record contains a date, the KPI ID, multiple dimensions and the KPI value.我有一个包含历史 KPI 信息的数据集:每条记录都包含一个日期、KPI ID、多个维度和 KPI 值。

Based on a list of 3-tuples, I want to transform this dataframe such that the end result is a combination of 2 records in the existing dataframe with a numerator and a denominator, each coming from an individual record with the same date/dimensions.基于 3 元组列表,我想转换此 dataframe 以便最终结果是现有 dataframe 中的 2 条记录与分子和分母的组合,每条记录都来自具有相同日期/维度的单个记录。

Current dataframe:当前 dataframe:

Date  | KPI_ID | Dimension | Value
Apr 5 | KPI_1  | Lorem     | 1
Apr 5 | KPI_2  | Lorem     | 3
Apr 5 | KPI_1  | Ipsum     | 4
Apr 5 | KPI_2  | Ipsum     | 8
Apr 5 | KPI_3  | Dolor     | 2
Apr 5 | KPI_4  | Dolor     | 2

List of 3-tuples giving the combinations of KPI_IDs such as [Result_ID, KPI_Numerator, KPI_Denominator]:给出 KPI_ID 组合的三元组列表,例如 [Result_ID, KPI_Numerator, KPI_Denominator]:

[['Result_1', 'KPI_1', 'KPI_2'], ['Result_2', 'KPI_3', 'KPI_4']]

Desired result:期望的结果:

Date  | Result_ID | Dimension | Numerator | Denominator
Apr 5 | Result_1  | Lorem     | 1         | 3
Apr 5 | Result_1  | Ipsum     | 4         | 8
Apr 5 | Result_2  | Dolor     | 2         | 2

I have tried to use df.merge and df.groupby with an aggregation function and but am struggling to understand how I can best integrate the list of tuples in the equation.我尝试将 df.merge 和 df.groupby 与聚合 function 一起使用,但我很难理解如何最好地将元组列表整合到方程中。 Looping over the dataframe does not seem to be the answer because I would have to manually look for the record with exactly the same dimensions which I do not think is performant.循环遍历 dataframe 似乎不是答案,因为我必须手动查找尺寸完全相同的记录,我认为这不是高性能的。

Let's create a combination within each mapping, pairing Result with the KPIs .让我们在每个映射中创建一个组合,将ResultKPIs配对。 Create a dataframe of this combination, merge with the original dataframe, pivot, and conclude with some massaging, to get the data in the final form that OP desires.创建此组合的 dataframe,与原始 dataframe、pivot 合并,最后进行一些按摩,以获得 OP 所需的最终形式的数据。

The caveat with this is pivot requires unique combination of index and columns ;需要注意的是pivot需要indexcolumns的唯一组合; for the data shared, there is no need to worry about this.对于共享的数据,无需担心这一点。

from itertools import product, chain
mapping = [['Result_1', 'KPI_1', 'KPI_2'], ['Result_2', 'KPI_3', 'KPI_4']]
maps = (product([left], [*right]) for left, *right in mapping)
maps = chain.from_iterable(maps)
maps = pd.DataFrame(maps, columns=['Result_ID', 'KPI_ID'])

maps
  Result_ID KPI_ID
0  Result_1  KPI_1
1  Result_1  KPI_2
2  Result_2  KPI_3
3  Result_2  KPI_4

(df
 .merge(maps, how='left', on='KPI_ID')
 .assign(KPI_ID = lambda df: df.KPI_ID.map({"KPI_1":"Numerator",           
                                            "KPI_2":"Denominator",
                                            "KPI_3":"Numerator",
                                            "KPI_4":"Denominator"}),
         sorter = lambda df: df.Dimension.factorize()[0])
 .pivot(['Date','Result_ID','Dimension', 'sorter'],  
         'KPI_ID', 
         'Value')
 .rename_axis(columns=None)
 .sort_values('sorter')
 .droplevel('sorter')
 .iloc[:, ::-1]
 .reset_index()
 )

    Date Result_ID Dimension  Numerator  Denominator
0  Apr 5  Result_1     Lorem          1            3
1  Apr 5  Result_1     Ipsum          4            8
2  Apr 5  Result_2     Dolor          2            2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM