简体   繁体   中英

Pandas merge tables with two columns in common

I am working on two big data bases:

dataM : 在此处输入图像描述

dataD 在此处输入图像描述

I want to join the two data frames by County and State and Year , but the dataM has to retain all of the columns, and only get de Deprivation Index Percent of the dataD. Also, I want to drop the rows where counties does not exist on one or the another. For instance, on dataM we have AK and its counties, but on dataD there is not AK, so I want to drop all those rows on dataM. In the same way, if the counties and states exist in both, I want to assign the Deprivation Index Percent to all the rows with that county in that state. I tried everyting, buy I can't make it work.

I tried this in many forms:

dataM = pd.merge(dataM, dataD, how='right', left_on=['County', 'State'], right_on=['County', 'State'])

and by filtering Baldwin county which is on both data frames, I got this:

在此处输入图像描述

I don't understand why I am getting NaN if the county and state exist in both data frames. Please help me.

I think you need an inner join -

dataM = pd.merge(dataM, dataD[['depr_ind_col', 'County', 'State']], how='inner', left_on=['County', 'State'], right_on=['County', 'State'])

After so many tries, I ended up concatenating the county and state for dataM and assigning it to a new column name "County, State". Then, I just used a simple merge method:

dataM = pd.merge(dataM , dataD, how='right', on=['County, State']) 
dataM = dataM[dataM['County, State'] == 'Baldwin County, GA']
dataM

在此处输入图像描述

That gave me the results a was looking for. I will split the county and state after this, and then drop rows with NaN on Births.

Thank you for your help though!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM