Pandas merge tables with two columns in common

Question

I am working on two big data bases:

dataM :

dataD

I want to join the two data frames by County and State and Year , but the dataM has to retain all of the columns, and only get de Deprivation Index Percent of the dataD. Also, I want to drop the rows where counties does not exist on one or the another. For instance, on dataM we have AK and its counties, but on dataD there is not AK, so I want to drop all those rows on dataM. In the same way, if the counties and states exist in both, I want to assign the Deprivation Index Percent to all the rows with that county in that state. I tried everyting, buy I can't make it work.

I tried this in many forms:

dataM = pd.merge(dataM, dataD, how='right', left_on=['County', 'State'], right_on=['County', 'State'])

and by filtering Baldwin county which is on both data frames, I got this:

I don't understand why I am getting NaN if the county and state exist in both data frames. Please help me.

Answer 1

I think you need an inner join -

dataM = pd.merge(dataM, dataD[['depr_ind_col', 'County', 'State']], how='inner', left_on=['County', 'State'], right_on=['County', 'State'])

Answer 2

After so many tries, I ended up concatenating the county and state for dataM and assigning it to a new column name "County, State". Then, I just used a simple merge method:

dataM = pd.merge(dataM , dataD, how='right', on=['County, State']) 
dataM = dataM[dataM['County, State'] == 'Baldwin County, GA']
dataM

That gave me the results a was looking for. I will split the county and state after this, and then drop rows with NaN on Births.

Thank you for your help though!

Pandas merge tables with two columns in common

Question

2 answers

solution1
0 2020-12-03 11:46:31

solution2
0 2020-12-03 12:49:41

Pandas merge tables with two columns in common

Question

2 answers

solution1 0 2020-12-03 11:46:31

solution2 0 2020-12-03 12:49:41

solution1
0 2020-12-03 11:46:31

solution2
0 2020-12-03 12:49:41