简体   繁体   中英

Pandas: Use multiple columns of a dataframe as index of another

I've got a large dataframe with my data in it, and another dataframe of the same first dimension that contains metadata about each point in time (eg, what trial number it was, what trial type it was).

What I want to do is slice the large dataframe using the values of the "metadataframe". I want to keep these separate (rather than storing the metadataframe as a multi-index of the larger one).

Right now, I am trying to do something like this:

def my_func(container):
   container.big_df.set_index(container.meta_df[['col1', 'col2']])
   container.big_df.loc['col1val', 'col2val'].plot()

However, this returns the following error:

ValueError: Must pass DataFrame with boolean values only

Note that this works fine if I only pass a single column to set_index.

Can anyone figure out what's going wrong here? Alternatively, can someone tell me that I'm doing this in a totally stupid and hacky way, and that there's a much better way to go about it? :)

MY SOLUTION

Thanks for the ideas. I played around with the indexing a little bit, and this seems to be the easiest / fastest. I didn't like having to strip the index of its name, and transposing the values etc. seemed cumbersome. I realized something interesting (and probably worth easily fixing):

dfa.set_index(dfb[['col1', 'col2']]) 

doesn't work, but

dfa.set_index([dfb.col1, dfb.col2])

does.

So, you can basically turn dfb into a list of columns, making set_index work, by the following convention:

dfa.set_index([dfb[col] for col in ['col1', 'col2']])

Use MultiIndex.from_arrays() to create the index object:

import pandas as pd
df1 = pd.DataFrame({"A":[1,2,3], "B":["a","b","c"]})
df2 = pd.DataFrame({"C":[100,200,300]})
df2.index = pd.MultiIndex.from_arrays(df1.values.T)

print df2

the result:

       C
1 a  100
2 b  200
3 c  300

将第一行更改为:

container.big_df.index=pd.MultiIndex.from_arrays(container.meta_df[['col1', 'col2']].values.T, names=['i1','i2'])

I implemented with reference of the this : link

import pandas as pd
 
employees = pd.DataFrame({
    'EmpCode': ['Emp001', 'Emp002', 'Emp003', 'Emp004', 'Emp005'],
    'Name': ['John', 'Doe', 'William', 'Spark', 'Mark'],
    'Occupation': ['Chemist', 'Statistician', 'Statistician',
                   'Statistician', 'Programmer'],
    'Date Of Join': ['2018-01-25', '2018-01-26', '2018-01-26', '2018-02-26',
                     '2018-03-16'],
    'Age': [23, 24, 34, 29, 40]})
 
print("\n --------- Before Index ----------- \n")
print(employees)
 
print("\n --------- Multiple Indexing ----------- \n")
print(employees.set_index(['Occupation', 'Age']))

Before :

在此处输入图片说明

tempDf1 = tempDf.set_index(['Country', 'Region','Happiness_Rank','Happiness_Score','Economy_(GDP_per_Capita)'])
tempDf1

After : 在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM