简体   繁体   English

将交叉表数据从 excel 导入 Pandas 数据框

[英]importing Cross tab data from excel to Pandas Data frame

I have data in excel extracted from IBM Cube in the form of Cross tab.我有从 IBM Cube 以交叉表形式提取的 excel 中的数据。

Crosstab example:
 |Account| Entity| Functions| JAN    |  FEB   | MAR   | JAN      | Feb    |  Mar |
                             Actuals  Actuals  Actuals  Forecast  Forecast Forecast
  A2100    10021    ABS        $200    $300    $270     $230      $270     $250
  A2200    20023    GBS        $320    $285    $360     $350      $300     $400

How to read cross tab in the panda data frame and convert it into the columnar format?如何读取熊猫数据框中的交叉表并将其转换为列格式? eventually, I want to create functions which can show the differences like (actuals minus forecast) on selection of Month and Functions.最终,我想创建函数来显示选择月份和函数时的差异,例如(实际值减去预测值)。

Disclaimer:- I am new to python, any directions will be helpful.免责声明:- 我是 python 的新手,任何指示都会有所帮助。 I am trying to understand if there is any way to achieve this?我想了解是否有任何方法可以实现这一目标? I only know simple excel read and csv read which requires data to be in columnar form.我只知道简单的 excel 读取和 csv 读取,这需要数据为柱状形式。

df = pd.read_excel("<path to your file>.xlsx")

final output should look like as suggested by Stef, in addition there should be a column showing variance (Forecast-Actual)最终的 output 应该看起来像 Stef 建议的那样,此外应该有一列显示方差(Forecast-Actual)

Assuming that you already read your data from Excel into a dataframe df you can use melt and merge to unpivot your data like this:假设您已经将数据从 Excel 读取到 dataframe df中,您可以使用meltmerge来取消透视数据,如下所示:

import pandas as pd

data = {'Account': [None, 'A2100', 'A2200'],
 'Entity': [None, '10021', '20023'],
 'Functions': [None, 'ABS', 'GBS'],
 'JAN': ['Actuals', '$200', '$320'],
 'FEB': ['Actuals', '$300', '$285'],
 'MAR': ['Actuals', '$270', '$360'],
 'JAN.1': ['Forecast', '$230', '$350'],
 'Feb': ['Forecast', '$270', '$300'],
 'Mar': ['Forecast', '$250', '$400']}
df = pd.DataFrame(data)

ma = df.iloc[0].ne('Forecast') # mask for actuals
dfa = df.loc[1:,ma.index[ma]]

mf = df.iloc[0].ne('Actuals') # mask for forcasts
dff = df.loc[1:,mf.index[mf]]
dff.columns = dfa.columns

res = pd.melt(dfa, ['Account','Entity','Functions'], 
              var_name='Month', 
              value_name='Actuals').merge(
                  pd.melt(dff, ['Account','Entity','Functions'], 
                          var_name='Month', 
                          value_name='Forecast'))

Result:结果:

  Account Entity Functions Month Actuals Forecast
0   A2100  10021       ABS   JAN    $200     $230
1   A2200  20023       GBS   JAN    $320     $350
2   A2100  10021       ABS   FEB    $300     $270
3   A2200  20023       GBS   FEB    $285     $300
4   A2100  10021       ABS   MAR    $270     $250
5   A2200  20023       GBS   MAR    $360     $400

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM