简体   繁体   English

熊猫中的高级数据透视表

[英]Advanced Pivot Table in Pandas

I am trying to optimize some table transformation scripts in Python Pandas, which I am trying to feed with huge data sets (above 50k rows). 我正在尝试优化Python Pandas中的一些表转换脚本,我正在尝试使用庞大的数据集(超过5万行)填充这些数据。 I wrote a script that iterates through every index and parses values into a new data frame (see example below), but I am experiencing performance issues. 我编写了一个脚本,该脚本遍历每个索引并将值解析为一个新的数据帧(请参见下面的示例),但是我遇到了性能问题。 Is there any pandas function, that could get the same results without iterating? 是否有任何pandas函数可以在不迭代的情况下获得相同的结果?

Example code: 示例代码:

from datetime import datetime
import pandas as pd

date1 = datetime(2019,1,1)
date2 = datetime(2019,1,2)

df = pd.DataFrame({"ID": [1,1,2,2,3,3],
                  "date": [date1,date2,date1,date2,date1,date2],
                  "x": [1,2,3,4,5,6],
                  "y": ["a","a","b","b","c","c"]})


new_df = pd.DataFrame()
for i in df.index:

    new_df.at[df.at[i, "ID"], "y"] = df.at[i, "y"]

    if df.at[i, "date"] == datetime(2019,1,1):
        new_df.at[df.at[i, "ID"], "x1"] = df.at[i, "x"]
    elif df.at[i, "date"] == datetime(2019,1,2):
        new_df.at[df.at[i, "ID"], "x2"] = df.at[i, "x"]

output: 输出:

   ID       date  x  y
0   1 2019-01-01  1  a
1   1 2019-01-02  2  a
2   2 2019-01-01  3  b
3   2 2019-01-02  4  b
4   3 2019-01-01  5  c
5   3 2019-01-02  6  c

   y   x1   x2
1  a  1.0  2.0
2  b  3.0  4.0
3  c  5.0  6.0

The transformation basically groups the rows by the "ID" column and gets the "x1" values from the rows with date 2019-01-01, and the "x2" values from the rows with date 2019-01-02. 转换基本上按“ ID”列对行进行分组,并从日期为2019-01-01的行中获取“ x1”值,并从日期为2019-01-02的行中获取“ x2”值。 The "y" value is the same within the same "ID". 在相同的“ ID”中,“ y”值相同。 "ID" columns become the new indexes. “ ID”列成为新索引。

I'd appreciate any advice on this matter. 我很乐意就此事提出任何建议。

Using pivot_tables will get what you are looking for: 使用pivot_tables将获得您想要的东西:

result = df.pivot_table(index=['ID', 'y'], columns='date', values='x')
result.rename(columns={date1: 'x1', date2: 'x2'}).reset_index('y')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM