[英]Reshaping non-numeric values in pandas dataframe
I've searched through google to find an answer but haven't had luck. 我已经通过Google搜索找到了答案,但是还没有运气。 I need to reshape a pandas dataframe to have numeric non-numeric values (comp_url) as the "value" in a multi-index dataframe.
我需要重塑熊猫数据框,使其具有数字非数字值(comp_url)作为多索引数据框中的“值”。 Below is a sample of the data:
以下是数据示例:
store_name sku comp price ship comp_url
CSE A1025 compA 30.99 9.99 some url
CSE A1025 compB 30.99 9.99 some url
CSE A1025 compC 30.99 9.99 some url
I have several store_name's so I need to have it look like this: 我有几个store_name,所以我需要像这样:
SKU CSE store_name2
comp_url price ship comp_url price ship
A1025 some url 30.99 9.99 some url 30.99 9.99
Any ideas or guidance would be appreciated! 任何想法或指导,将不胜感激!
Perhaps a pandas.Panel is more appropriate. 也许是pandas.Panel更合适。 They are for 3 dimensional data.
它们用于3维数据。 DataFrames are 2d
DataFrames是2d
Assuming each SKU/store_name combination is unique, here is a working example: 假设每个SKU / store_name组合都是唯一的,这是一个有效的示例:
# imports
import pandas as pd
# Create a sample DataFrame.
cols = ['store_name', 'sku', 'comp', 'price', 'ship', 'comp_url']
records = [['CSA', 'A1025', 'compA', 30.99, 9.99, 'some url'],
['CSB', 'A1025', 'compB', 32.99, 9.99, 'some url2'],
['CSA', 'A1026', 'compC', 30.99, 19.99, 'some url'],
['CSB', 'A1026', 'compD', 30.99, 9.99, 'some url3']]
df = pd.DataFrame.from_records(records, columns=cols)
# Move both 'sku' and 'store_name' to the rows index; the combination
# of these two columns provide a unique identifier for each row.
df.set_index(['sku', 'store_name'], inplace=True)
# Move 'store_name' from the row index to the column index. Each
# unique value in the 'store_name' index gets its own set of columns.
# In the multiindex, 'store_name' will be below the existing column
# labels.
df = df.unstack(1)
# To get the 'store_name' above the other column labels, we simply
# reorder the levels in the MultiIndex and sort it.
df.columns = df.columns.reorder_levels([1, 0])
df.sort_index(axis=1, inplace=True)
# Show the result.
df
This works because the sku/store_name label combination is unique. 这是有效的,因为sku / store_name标签组合是唯一的。 When we use
unstack()
, we are just moving labels and cells around. 当我们使用
unstack()
,我们只是在移动标签和单元格。 We are not doing any aggregation. 我们没有进行任何汇总。 If we were doing something that didn't have unique labels and required aggregation,
pivot_table()
would probably be a better option. 如果我们正在做的事情没有唯一的标签和所需的聚合,
pivot_table()
可能是一个更好的选择。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.