简体   繁体   中英

Pandas: Check two dataframes for matching values, then fill a row depending on the label

I primarily used MATLAB all through college as a math major and my programming was just building math equations and modeling. Now I have been learning to use Python and in particular, pandas. I am trying to search for values in a column of one dataframe and match them with a value in a column of a different dataframe. If they do match, I want them to give a label to the original dataframe.

For example, I have my first column of employees and I want figure out whether aliceB is Busy or Non-Busy in my first dataframe and label as such in col3.

df1 = {"col1":["aliceA", "aliceB", "aliceC"], "col2":["CO", "WA", "PA"]}
df1 = pd.DataFrame(df1)
df1['col3'] = np.nan
In[]df1
Out[]: 
     col1 col2  col3
0  aliceA   CO   NaN
1  aliceB   WA   NaN
2  aliceC   PA   NaN

df2 = {'col1': ["aliceB", "aliceA", "aliceC",  "bobC", "bobB", "bobA",], 'col2': ['Busy','Non-Busy','Busy','Non-Busy','Non-Busy','Busy']}
df2 = pd.DataFrame(df2)
In[]df2
Out[]: 
     col1      col2
0  aliceB      Busy
1  aliceA  Non-Busy
2  aliceC      Busy
3    bobC  Non-Busy
4    bobB  Non-Busy
5    bobA      Busy

***Preferred Output***
Out[]: 
     col1 col2      col3
0  aliceA   CO  Non-Busy
1  aliceB   WA      Busy
2  aliceC   PA      Busy

For this kind of problem MATLAB I would take my two matrices and iterate through using nested for loops to find the value. In Python I made:

for i in range(0, df2.shape[0]):
        for j in range(0, df1.shape[0]):
            if(df2.col1[i] == df1.col1[j]):
                df1.col3[j] = df2.col2[i]   

But I get this warning and I have to Control + C to get out of it to continue:

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

df1
Out[]: 
     col1 col2      col3
0  aliceA   CO  Non-Busy
1  aliceB   WA      Busy
2  aliceC   PA      Busy

Technically this code works and my data is filled in, but I know this is probably a poor way to solve my problem. For this small example it doesn't force me to Control+C, but it does when my df1 is thousands of rows long.

Simple map

df1.col3=df1.col1.map(df2.set_index('col1').col2)
df1
Out[31]: 
     col1 col2      col3
0  aliceA   CO  Non-Busy
1  aliceB   WA      Busy
2  aliceC   PA      Busy

Using merge :

df1.merge(df2.rename(columns={'col2': 'col3'}), on='col1')

     col1 col2      col3
0  aliceA   CO  Non-Busy
1  aliceB   WA      Busy
2  aliceC   PA      Busy

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM