简体   繁体   English

融合前两行作为变量的数据框

[英]Melt dataframe with first two rows as variables

apologies but this has me stumped, I thought I could pass the following dataframe into a simple pd.melt using iloc to reference my varaibles but it wasn't working for me (i'll post the error in a moment) 抱歉,这让我感到难过,我想我可以使用iloc将以下数据框传递到一个简单的pd.melt中,以引用我的变量,但这对我不起作用(我稍后会发布错误)

sample df 样本df

Date,     0151,        0561,       0522,   0912
0,Date,     AVG Review,  AVG Review, Review, Review 
1,Date      NaN          NaN          NaN    NaN
2,01/01/18  2            2.5          4        5 

so as you can see, my ID as in the top row, the type of review is in the 2nd row, the date sits in the first column and the observations of the review are in rows on the date. 如您所见,我的ID位于第一行,评论类型位于第二行,日期位于第一列,评论的观察结果位于日期的行中。

what I'm trying to do is melt this df to get the following 我正在尝试做的是融化这个df以获取以下内容

ID,   Date,     Review,        Score
0151, 01/01/18, Average Review 2

I thought I could be cheeky and just pass the following 我以为我会厚脸皮,只要通过以下

pd.melt pd.melt(df,id_vars=[df.iloc[0]],value_vars=df.iloc[1] ) pd.melt pd.melt(df,id_vars=[df.iloc[0]],value_vars=df.iloc[1] )

but this threw the error 'Series' objects are mutable, thus they cannot be hashed 但这引发了错误, 'Series' objects are mutable, thus they cannot be hashed

I've had a look at similar answers to pd.melt and perhaps reshape or unpivot? 我看过与pd.melt类似的答案,也许是重塑还是取消枢轴? but I'm lost on how I should proceed. 但是我迷失了前进的方向。

any help is much appreciated. 任何帮助深表感谢。

Edit for Nixon : 编辑尼克松:

My first Row has my unique IDs 我的第一行有我唯一的ID

2nd row has my observation, which in this case is a type of review (average, normal) 第二行有我的观察,在这种情况下,这是一种检查类型(平均,正常)

3rd row onward has the variables assigned to the above observation - lets call this score. 从第三行开始,将变量分配给上述观察值-称之为得分。

1st column has my dates which have the score across by row. 第一栏有我的日期,每一行都有分数。

An alternative to pd.melt is to set your rows as column levels of a multiindex and then stack them. pd.melt的替代pd.melt是将行设置为多pd.melt列级别,然后将其stack Your metadata will be stored as an index rather than column though. 但是,您的元数据将存储为索引而不是列。 Not sure if that matters. 不确定是否重要。

df = pd.DataFrame([
    ['Date',     '0151',        '0561',       '0522',   '0912'],
    ['Date',     'AVG Review',  'AVG Review', 'Review', 'Review'],
    ['Date',     'NaN',         'NaN',        'NaN',    'NaN'],
    ['01/01/18', 2,             2.5,          4,        5],
])

df = df.set_index(0)
df.index.name = 'Date'
df.columns = pd.MultiIndex.from_arrays([df.iloc[0, :], df.iloc[1, :]], names=['ID', 'Review'])
df = df.drop(df.index[[0, 1, 2]])

df.stack('ID').stack('Review')

Output: 输出:

Date      ID    Review    
01/01/18  0151  AVG Review      2
          0522  Review          4
          0561  AVG Review    2.5
          0912  Review          5
dtype: object

You can easily revert index to columns with reset_index . 您可以使用reset_index轻松将索引还原为列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM