简体   繁体   English

python pandas向multi_index数据框添加一个较低级别的列

[英]python pandas add a lower level column to multi_index dataframe

Could someone help me to achieve this task? 有人可以帮我完成这项任务吗? I have data in multi-level data frame through the unstack() operation: 我通过unstack()操作将数据存储在多级数据框中:

Original df:
Density  Length  Range  Count
  15k    0.60  small    555
  15k    0.60    big     17
  15k    1.80  small    141
  15k    1.80    big     21
  15k    3.60  small    150
  15k    3.60    big     26
  20k    0.60  small   5543
  20k    0.60    big     22
  20k    1.80  small    553
  20k    1.80    big     25
  20k    3.60  small    422
  20k    3.60    big     35

df  = df.set_index(['Density','Length','Range']).unstack('Range')

# After unstack:
                  Count       
Range             big  small
Density Length              
15k     0.60       17    555
        1.80       21    141
        3.60       26    150
20k     0.60       22   5543
        1.80       25    553
        3.60       35    422

Now I try to add an extra column in level 1. it is the ratio of the small/big. 现在,我尝试在级别1中添加一个额外的列。它是小/大的比率。 I have tried the following syntax, no error but with different outcomes 我尝试了以下语法,没有错误,但结果不同

#df[:]['ratio']=df['Count']['small']/df['Count']['big'] ## case 1. no error, no ratio
#df['Count']['ratio']=df['Count']['small']/df['Count']['big'] ## case 2. no error, no ratio
#df['ratio']=df['Count']['small']/df['Count']['big'] ## case 3. no error, ratio on column level 0
df['ratio']=df.ix[:,1]/df.ix[:,0]                    ## case 4. no error, ratio on column level 0

#After execution above code, df:
                  Count         ratio
Range             big  small       
Density Length                     
15k     0.60       17    555  32.65
        1.80       21    141   6.71
        3.60       26    150   5.77
20k     0.60       22   5543 251.95
        1.80       25    553  22.12
        3.60       35    422  12.06

I don't understand why case 1 & 2 show no error neither adding new ratio column. 我不明白为什么案例1和案例2没有显示错误,也没有添加新的比率列。 and why in case 3 & 4 the ratio column is on level 0, not the expected level 1. Also like to know if there is a better/concise way to achieve this. 以及为什么在情况3和4中,比率列位于级别0而不是预期的级别1。还想知道是否有更好/简洁的方法来实现此目的。 Case 4 is the best I can do but I don't like the implicit indexing way (instead of using the name) to refer to a column. 情况4是我能做的最好的事情,但我不喜欢隐式索引方式(而不是使用名称)来引用列。

Thanks 谢谢

Case 1 : 情况1

df[:]['ratio']=df['Count']['small']/df['Count']['big'] 

df[:] is a copy of df . df[:]df的副本。 They are different objects, each with its own copy of the underlying data: 它们是不同的对象,每个对象都有其自己的基础数据副本:

In [69]: df[:] is df
Out[69]: False

So modifying the copy has no effect on the original df. 因此,修改副本对原始df无效。 Since no reference is maintained for df[:] , the object is garbage collected after the assignment, making the assignment useless. 由于没有为df[:]保留引用,因此分配后将垃圾回收对象,从而使分配无用。


Case 2 : 情况2

df['Count']['ratio']=df['Count']['small']/df['Count']['big'] 

uses chain-indexing . 使用链索引 Avoid chain indexing when making assignments. 进行分配时,避免链式索引。 The link explains why assignments using chain-indexing on the left-hand side may not affect df . 该链接说明了为什么在左侧使用链式索引的分配可能不会影响df

If you set 如果您设定

pd.options.mode.chained_assignment = 'warn'

then Pandas will warn you not to use chain-indexing in assignments: 然后熊猫会警告您不要在作业中使用链式索引:

SettingWithCopyError: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Case 3 : 情况3

df['ratio']=df['Count']['small']/df['Count']['big'] 

and Case 4 案例4

df['ratio']=df.ix[:,1]/df.ix[:,0]

both work, but it could done more efficently using 两者都可以,但是使用它可以更有效地完成

df['ratio'] = df['Count','small']/df['Count','big']

Here is a microbenchmark showing that using df[tuple_index] is faster than chain-indexing: 这是一个df[tuple_index]显示使用df[tuple_index]比链索引更快速:

In [99]: %timeit df['Count']['small']
1000 loops, best of 3: 501 µs per loop

In [128]: %timeit df['Count','small']
100000 loops, best of 3: 8.91 µs per loop

If you want ratio to be the level 1 label, then you must tell Pandas that the level 0 label is Count . 如果您希望ratio成为1级标签,那么您必须告诉Pandas 0级标签是Count You can do that by assigning to df['Count','ratio'] : 您可以通过将其分配给df['Count','ratio']

In [96]: df['Count','ratio'] = df['Count']['small']/df['Count','big']

# In [97]: df
# Out[97]: 
#                Count                  
# Range            big small       ratio
# Density Length                        
# 15k     0.6       17   555   32.647059
#         1.8       21   141    6.714286
#         3.6       26   150    5.769231
# 20k     0.6       22  5543  251.954545
#         1.8       25   553   22.120000
#         3.6       35   422   12.057143

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM