简体   繁体   中英

Join Values from Upper-Level Aggregates to Lower-Level Aggregates in a Pandas Data Frame

I have two Pandas data frames.

The first data frame ( county ) has county-level data -

COUNTY_FIPS    COUNTY_INCOME    COUNTY_PERCENT_UNINSURED
      51001            42260                        16.7
      51003            72265                         7.6

The second data frame ( tract ) has Census tract-level data -

 TRACT_FIPS    TRACT_INCOME    TRACT_PERCENT_UNINSURED
51001090100           48861                       13.4
51001090200           42663                        9.4
51003090300           32532                       19.7
51003090100           55678                       12.1

I would like to join values from upper-level aggregates (county-level data) to the lower-level aggregates (Census tract-level data). Note that the first five numbers of the TRACT_FIPS correspond to which county those Census tracts are in (see COUNTY_FIPS). My final data frame would look like this -

 TRACT_FIPS    TRACT_INCOME    TRACT_PERCENT_UNINSURED    COUNTY_INCOME    COUNTY_PERCENT_UNINSURED
51001090100           48861                       13.4            42260                        16.7
51001090200           42663                        9.4            42260                        16.7 
51003090300           32532                       19.7            72265                         7.6
51003090100           55678                       12.1            72265                         7.6

Here's what I have programmed so far (with some pseudocode) -

county_income_values = [] # empty list of county income values
county_percent_uninsured_values # empty list of county percent uninsured values

for tract_fips in tract['tract_fips']: # iterate through all the tract_fips in the tract_fips column
    for county_fips in county['county_fips']: # iterate through all the county_fips in the county_fips column
        if tract_fips[0:5] == county_fips: # if the first 5 digits of the tract_id match the county_id
            # TO DO: Find the index of where the if statement evaluates to true, and append the 
                     county income value at that index to county_income_values_list
            # TO DO: Find the index of where the if statement evaluates to true, and append the 
                     county percent uninsured value at that index to county_percent_uninsured_values 

If there is a more efficient way to go about solving this problem, then feel free to ignore my code above.

Thanks very much in advance!

You can use the function merge . First, you need to extract the first five digits from the column 'TRACT_FIPS' in the second dataframe. Then you can can convert the column 'COUNTY_FIPS' to string and use both columns to merge on:

left = df2['TRACT_FIPS'].astype('str').str[:5]
right = df1['COUNTY_FIPS'].astype('str')

df2.merge(df1, left_on=left, right_on=right)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM