简体   繁体   English

重塑熊猫数据框中的非数字值

[英]Reshaping non-numeric values in pandas dataframe

I've searched through google to find an answer but haven't had luck. 我已经通过Google搜索找到了答案,但是还没有运气。 I need to reshape a pandas dataframe to have numeric non-numeric values (comp_url) as the "value" in a multi-index dataframe. 我需要重塑熊猫数据框,使其具有数字非数字值(comp_url)作为多索引数据框中的“值”。 Below is a sample of the data: 以下是数据示例:

    store_name sku  comp    price   ship    comp_url
     CSE      A1025 compA   30.99   9.99    some url
     CSE      A1025 compB   30.99   9.99    some url
     CSE      A1025 compC   30.99   9.99    some url

I have several store_name's so I need to have it look like this: 我有几个store_name,所以我需要像这样:

SKU      CSE                            store_name2 
       comp_url  price  ship       comp_url  price  ship
A1025  some url   30.99   9.99      some url   30.99   9.99

Any ideas or guidance would be appreciated! 任何想法或指导,将不胜感激!

Perhaps a pandas.Panel is more appropriate. 也许是pandas.Panel更合适。 They are for 3 dimensional data. 它们用于3维数据。 DataFrames are 2d DataFrames是2d

Assuming each SKU/store_name combination is unique, here is a working example: 假设每个SKU / store_name组合都是唯一的,这是一个有效的示例:

# imports
import pandas as pd

# Create a sample DataFrame.
cols = ['store_name', 'sku', 'comp', 'price', 'ship', 'comp_url']
records = [['CSA', 'A1025', 'compA', 30.99, 9.99, 'some url'],
           ['CSB', 'A1025', 'compB', 32.99, 9.99, 'some url2'],
           ['CSA', 'A1026', 'compC', 30.99, 19.99, 'some url'],
           ['CSB', 'A1026', 'compD', 30.99, 9.99, 'some url3']]
df = pd.DataFrame.from_records(records, columns=cols)

# Move both 'sku' and 'store_name' to the rows index; the combination
# of these two columns provide a unique identifier for each row.
df.set_index(['sku', 'store_name'], inplace=True)
# Move 'store_name' from the row index to the column index. Each
# unique value in the 'store_name' index gets its own set of columns.
# In the multiindex, 'store_name' will be below the existing column
# labels.
df = df.unstack(1)
# To get the 'store_name' above the other column labels, we simply
# reorder the levels in the MultiIndex and sort it.
df.columns = df.columns.reorder_levels([1, 0])
df.sort_index(axis=1, inplace=True)

# Show the result. 
df

This works because the sku/store_name label combination is unique. 这是有效的,因为sku / store_name标签组合是唯一的。 When we use unstack() , we are just moving labels and cells around. 当我们使用unstack() ,我们只是在移动标签和单元格。 We are not doing any aggregation. 我们没有进行任何汇总。 If we were doing something that didn't have unique labels and required aggregation, pivot_table() would probably be a better option. 如果我们正在做的事情没有唯一的标签和所需的聚合, pivot_table()可能是一个更好的选择。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM