简体   繁体   English

把柱头变成观察熊猫

[英]turning column headers into observations pandas

I'm not sure if my issue has a specific name (I remember listening to a lecture where the teacher said that a portion of knowledge is knowing the name of things). 我不确定我的问题是否有特定的名称(我记得在一次听老师讲过的知识中,一部分知识知道事物的名称)。

Anyway, I'm working with some legacy systems and my data is output as the following: 无论如何,我正在使用一些旧系统,并且我的数据输出如下:

df = pd.DataFrame({'Shop' : [1,2,3,4,5,6,7,8,9,10],'Week 1' : [15,25,11,22,0,-1,15,11,76,62],'Week 2' : [5,44,55,21,12,51,-10,25,81,46]})
print(df)


 Shop   Week 1  Week 2
0   1   15      5
1   2   25      44
2   3   11      55
3   4   22      21
4   5   0       12
5   6   -1      51
6   7   15     -10
7   8   11      25
8   9   76      81
9   10  62      46

In this instance, the week number should be an observation and the number is an value that should be assigned to it. 在这种情况下,周号应该是一个观察值,而该数字是应该分配给它的值。

what I'm trying to do is the following. 我想做的是以下。

transpose the DF but keep the index as the Shop. 移置DF,但将索引保留为Shop。 turn each instance into an observation so taking only the first 2 shops as an example: 将每个实例变成一个观察值,因此仅以前两个商店为例:

    Shop    Week Hour
0   1       1    15
1   1       2    5
2   2       1    25
3   2       2    44

What would be the most pythonic way to achieve this? 实现这一目标的最有效方式是什么? on a relatively medium sized df (500 rows 52 weeks) 在相对中等大小的df上(500行52周)

Using wide_to_long 使用wide_to_long

pd.wide_to_long(df,'Week ',i='Shop',j='week')
Out[770]: 
           Week 
Shop week       
1    1        15
2    1        25
3    1        11
4    1        22
5    1         0
6    1        -1
7    1        15
8    1        11
9    1        76
10   1        62
1    2         5
2    2        44
3    2        55
4    2        21
5    2        12
6    2        51
7    2       -10
8    2        25
9    2        81
10   2        46

#pd.wide_to_long(df,'Week ',i='Shop',j='week').sort_index(level=0).reset_index().rename(columns={'Week ':'Hour'})

You can rename columns, pd.melt and then sort_values : 您可以重命名pd.melt列,然后sort_values

df.columns = [i if not i.startswith('Week') else int(i[-1]) for i in df]

res = pd.melt(df, id_vars='Shop', var_name='Week', value_name='Hour')\
        .sort_values('Shop').reset_index(drop=True)

print(res)

    Shop Week  Hour
0      1    1    15
1      1    2     5
2      2    1    25
3      2    2    44
...
16     9    2    81
17     9    1    76
18    10    1    62
19    10    2    46

I would use something like this, though it's a bit messy with all the re-naming: 我会使用类似这样的东西,尽管所有重命名都有些混乱:

# Rename columns with dict comprehension so it can extend to more than week 1 and week 2
df2 = (df.rename(columns={i: int(i.split()[-1]) for i in df.columns[1:]})
       .set_index('Shop')
       .stack()
       .reset_index()
       .rename(columns={'level_1':'Week', 0:'Hour'}))

>>> df2

    Shop  Week  Hour
0      1     1    15
1      1     2     5
2      2     1    25
3      2     2    44
4      3     1    11
5      3     2    55
6      4     1    22
7      4     2    21
8      5     1     0
9      5     2    12
10     6     1    -1
11     6     2    51
12     7     1    15
13     7     2   -10
14     8     1    11
15     8     2    25
16     9     1    76
17     9     2    81
18    10     1    62
19    10     2    46

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM