简体   繁体   中英

Vectorized dataframe look-up

I have two dataframes:

df = pd.DataFrame([['A', 'B'], ['B', 'A']], columns=['Mon', 'Tues'])

    Mon Tues
0   A   B
1   B   A

lookup = pd.DataFrame([[0, 1], [2, 3]], index=['A', 'B'], columns=df.columns)

    Mon Tues
A   0   1
B   2   3

For each day, for each key in the first dataframe, I would like to look up its value in the second dataframe. This is one way to do it:

output = pd.DataFrame()
for col in df.columns:
    output[col] = df[col].map(lookup[col])

    Mon Tues
0   0   3
1   2   1

Is there a way to get the same answer without the explicit loop?

You can use replace to exchange the values:

>>> df.replace(lookup)
   Mon  Tues
0    0     3
1    2     1

When passed to the replace method, DataFrames are treated similarly to nested dictionaries. For each column label, the column's values are mapped to the correspondingly indexed value in the lookup DataFrame.

I am assuming that each column data in df is a random order of the lookup.index and that lookup.index is in sorted order. So, going with it, here's a NumPy based vectorized solution, specifically using its advanced-indexing -

idx = np.argsort(df.values,1)
df_out = pd.DataFrame(lookup.values[idx,np.arange(idx.shape[1])],columns=df.columns)

Sample output -

In [41]: idx = np.argsort(df.values,1)

In [42]: pd.DataFrame(lookup.values[idx,np.arange(idx.shape[1])],columns=df.columns)
Out[42]: 
   Mon  Tues
0    0     3
1    2     1

If lookup.index is not in sorted order, we need to do some additional work -

sidx = lookup.index.argsort()
arr_out = lookup.values[idx,np.arange(idx.shape[1])][sidx]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM