简体   繁体   中英

How to compare between two columns across two related Dataframes in pandas

I have one DataFrame called limits_df with the schema:

"County Name"   "State"   "One-Unit Limit"

This looks like:

data1 = {'County Name': ["A", "B", "C", "D"], 'State': ['AA', 'AB', 'AA', 'AC'], 'One-Unit Limit': [100, 200, 150, 300]}
limits_df = pd.DataFrame.from_dict(data1)

And I have another DataFrame called loans_df with the schema:

county  state   price   

This looks like:

data2 = {'county': ["B", "C", "A", "E"], 'state': ['AB', 'AC', 'AA', 'AF'], 'price': [300, 200, 150, 300]}
loans_df = pd.DataFrame.from_dict(data2)

I want to create a new column in loans_df["jumbo"] which is True when the loan price is greater than the limit in its corresponding county. In code that would be:

county_limit = limits_df.loc[ (limits_df["County Name"] == str(loans_df["county"])) & (limits_df["State"] == str(loans_df["state"])) ]["One-Unit Limit"].item()
loan_price = loans_df["price"].item()
if(loan_price > county_limit):
   loans_df["jumbo"] = True
else:
   loans_df["jumbo"] = False

Doing this in a iterrows takes a really long time since I need to create loans_df["jumbo"] and then change what should be immutable data. Isn't there a simpler way to do this with a apply() or map() ?

IIUC, you could use

df2 = loans_df.merge(limits_df[['State', 'County Name', 'One-Unit Limit']], how='left',
                     left_on=['state', 'county'], right_on=['State', 'County Name'])
df2['jumbo'] = df2['price'] > df2['One-Unit Limit']

Where you use pd.merge with left-join to match a limit to every loan by State and County. Then you can immediately a boolean comparison to check whether jumbo is True or False .

Note that when there is no Limit for a state/county found, it outputs False in Jumbo.

This assumes that all counties and states in limits_df are found in loans_df

loans_df['jumbo'] = pd.merge(limits_df, loans_df, 
                             left_on=['County Name', 'State'],
                             right_on=['county', 'state'], how='left') \
                        .apply(lambda x: x['price'] > x['One-Unit Limit'], axis=1)
m=limits_df.merge(loans_df,left_on=['County Name','State'],right_on=['county','state'])
loans_df["jumbo"]=loans_df['county'].isin(m.loc[m['price']>m['One-Unit Limit'],'County Name'])
print(loans_df)

  county state  price  jumbo
0      B    AB    300   True
1      C    AC    200  False
2      A    AA    150   True
3      E    AF    300  False

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM