简体   繁体   English

大熊猫可互换双重索引?

[英]pandas interchangeable dual indexing?

I have a DataFrame and I build a dual index. 我有一个DataFrame,我建立了一个双重索引。 'start' values don't exist in 'end' index values and versa. “开始”值在“结束”索引值中不存在,反之亦然。

c_weights.rename(columns={0:'start',1:'end',2:'metric',3:'angular',4:'special',5:'cos_pi'}, inplace=True)
c_weights.set_index(['start','end'],inplace=True)
c_weights.head()

df head()

Id like to be able to call something like: c_weights.loc[1,638] or c_weights.loc[638,1] and get the same line of data. 我希望能够调用类似的内容:c_weights.loc [1,638]或c_weights.loc [638,1]并获取同一行数据。 To make it clear, the two index combinations are always unique. 为了清楚起见,这两个索引组合始终是唯一的。 How this can be bone? 这怎么可能是骨头?

Anyways, for the first case, you can just index using ix and passing a tuple on the row index 无论如何,对于第一种情况,您可以使用ix索引并在行索引上传递一个元组

c_weights.ix[(1,638)]

For the second case, I guess it'll depend whether you know off hand or not if you're trying to pass the end first, in which case I'd just construct a tuple in a right way or reverse it ( (638,1)[::-1] = (1, 638) ) 在第二种情况下,我想这取决于您是否先掌握了结局,在这种情况下,我只会以正确的方式构造一个元组或将其反转( (638,1)[::-1] = (1, 638)

To get to your point: since you say you have mutually exclusive start and end, you can also use the following list comprehension 直言不讳:由于您说自己具有互斥的开始和结束,因此您还可以使用以下列表理解

l = (start, end) # l = (end, start) returns the same
c_weights.ix[[x for x in c_weights.index if (x ==  l) or (x == l[::-1])]]

If you also have unique index, you can simplify this to: 如果您还有唯一索引,则可以将其简化为:

c_weights.ix[[x for x in c_weights.index if (x[0] ==  l[0]) or (x[1] == l[1])]]

A dataframe is a wrapper around an numpy ndarray in which a row and column index are assigned. 数据帧是围绕numpy ndarray的包装,在其中分配行和列索引。 We can define a second dataframe with different row or column indices and access the same ndarray. 我们可以定义具有不同行或列索引的第二个数据帧,并访问相同的ndarray。 For example, let's first define df1 , then define df2 with the same data, but swap the levels in a MultiIndex row index. 例如,让我们首先定义df1 ,然后使用相同的数据定义df2 ,但是交换MultiIndex行索引中的级别。 Leave the columns the same. 列保持不变。

import pandas as pd
import numpy as np

np.random.seed([3,1415])

df1 = pd.DataFrame(np.random.rand(4, 2),
                   pd.MultiIndex.from_product([('a', 'b'), (1, 2)]),
                   ['col1', 'col2'])
df2 = pd.DataFrame(df1.values, df1.index.swaplevel(0, 1), df1.columns)

print df1

         col1      col2
a 1  0.444939  0.407554
  2  0.460148  0.465239
b 1  0.462691  0.016545
  2  0.850445  0.817744

print df2

         col1      col2
1 a  0.444939  0.407554
2 a  0.460148  0.465239
1 b  0.462691  0.016545
2 b  0.850445  0.817744

We can see the data is the same, the indices are swapped. 我们可以看到数据是相同的,索引被交换了。 Accessing data from df1 is the same data as from df1 to the point of co-mutability. df1访问数据与从df1到共可变点的数据相同。 Let's change something in df1 and look at df2 让我们在df1一些更改并查看df2

df1.loc[('a', 1), 'col1'] = 1.
print df2

         col1      col2
1 a  1.000000  0.407554
2 a  0.460148  0.465239
1 b  0.462691  0.016545
2 b  0.850445  0.817744

Now that we're convinced, let's observe that we now have 2 dataframes from which we can access the same data. 现在我们已经确信,让我们观察一下,现在我们有2个数据框,可以从中访问相同的数据。 Let's define a function to do what the OP asked for. 让我们定义一个函数来执行OP的要求。

ambigui_t = lambda t: df1.loc[t] if t in df.index else df2.loc[t]

print ambigui_t(('a', 1))

col1    1.000000
col2    0.407554
Name: (a, 1), dtype: float64

print ambigui_t((1, 'a'))

col1    1.000000
col2    0.407554
Name: (1, a), dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM