简体   繁体   中英

pandas interchangeable dual indexing?

I have a DataFrame and I build a dual index. 'start' values don't exist in 'end' index values and versa.

c_weights.rename(columns={0:'start',1:'end',2:'metric',3:'angular',4:'special',5:'cos_pi'}, inplace=True)
c_weights.set_index(['start','end'],inplace=True)
c_weights.head()

df head()

Id like to be able to call something like: c_weights.loc[1,638] or c_weights.loc[638,1] and get the same line of data. To make it clear, the two index combinations are always unique. How this can be bone?

Anyways, for the first case, you can just index using ix and passing a tuple on the row index

c_weights.ix[(1,638)]

For the second case, I guess it'll depend whether you know off hand or not if you're trying to pass the end first, in which case I'd just construct a tuple in a right way or reverse it ( (638,1)[::-1] = (1, 638) )

To get to your point: since you say you have mutually exclusive start and end, you can also use the following list comprehension

l = (start, end) # l = (end, start) returns the same
c_weights.ix[[x for x in c_weights.index if (x ==  l) or (x == l[::-1])]]

If you also have unique index, you can simplify this to:

c_weights.ix[[x for x in c_weights.index if (x[0] ==  l[0]) or (x[1] == l[1])]]

A dataframe is a wrapper around an numpy ndarray in which a row and column index are assigned. We can define a second dataframe with different row or column indices and access the same ndarray. For example, let's first define df1 , then define df2 with the same data, but swap the levels in a MultiIndex row index. Leave the columns the same.

import pandas as pd
import numpy as np

np.random.seed([3,1415])

df1 = pd.DataFrame(np.random.rand(4, 2),
                   pd.MultiIndex.from_product([('a', 'b'), (1, 2)]),
                   ['col1', 'col2'])
df2 = pd.DataFrame(df1.values, df1.index.swaplevel(0, 1), df1.columns)

print df1

         col1      col2
a 1  0.444939  0.407554
  2  0.460148  0.465239
b 1  0.462691  0.016545
  2  0.850445  0.817744

print df2

         col1      col2
1 a  0.444939  0.407554
2 a  0.460148  0.465239
1 b  0.462691  0.016545
2 b  0.850445  0.817744

We can see the data is the same, the indices are swapped. Accessing data from df1 is the same data as from df1 to the point of co-mutability. Let's change something in df1 and look at df2

df1.loc[('a', 1), 'col1'] = 1.
print df2

         col1      col2
1 a  1.000000  0.407554
2 a  0.460148  0.465239
1 b  0.462691  0.016545
2 b  0.850445  0.817744

Now that we're convinced, let's observe that we now have 2 dataframes from which we can access the same data. Let's define a function to do what the OP asked for.

ambigui_t = lambda t: df1.loc[t] if t in df.index else df2.loc[t]

print ambigui_t(('a', 1))

col1    1.000000
col2    0.407554
Name: (a, 1), dtype: float64

print ambigui_t((1, 'a'))

col1    1.000000
col2    0.407554
Name: (1, a), dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM