简体   繁体   English

将 Pandas 数据帧中上层聚合的值连接到下层聚合

[英]Join Values from Upper-Level Aggregates to Lower-Level Aggregates in a Pandas Data Frame

I have two Pandas data frames.我有两个 Pandas 数据框。

The first data frame ( county ) has county-level data -第一个数据框( county )有县级数据——

COUNTY_FIPS    COUNTY_INCOME    COUNTY_PERCENT_UNINSURED
      51001            42260                        16.7
      51003            72265                         7.6

The second data frame ( tract ) has Census tract-level data -第二个数据框 ( tract ) 具有人口普查区域级别的数据 -

 TRACT_FIPS    TRACT_INCOME    TRACT_PERCENT_UNINSURED
51001090100           48861                       13.4
51001090200           42663                        9.4
51003090300           32532                       19.7
51003090100           55678                       12.1

I would like to join values from upper-level aggregates (county-level data) to the lower-level aggregates (Census tract-level data).我想将上级聚合(县级数据)的值连接到下级聚合(人口普查区级数据)。 Note that the first five numbers of the TRACT_FIPS correspond to which county those Census tracts are in (see COUNTY_FIPS).请注意,TRACT_FIPS 的前五个数字对应于这些人口普查区所在的县(请参阅 COUNTY_FIPS)。 My final data frame would look like this -我的最终数据框看起来像这样 -

 TRACT_FIPS    TRACT_INCOME    TRACT_PERCENT_UNINSURED    COUNTY_INCOME    COUNTY_PERCENT_UNINSURED
51001090100           48861                       13.4            42260                        16.7
51001090200           42663                        9.4            42260                        16.7 
51003090300           32532                       19.7            72265                         7.6
51003090100           55678                       12.1            72265                         7.6

Here's what I have programmed so far (with some pseudocode) -这是我到目前为止编写的程序(带有一些伪代码)-

county_income_values = [] # empty list of county income values
county_percent_uninsured_values # empty list of county percent uninsured values

for tract_fips in tract['tract_fips']: # iterate through all the tract_fips in the tract_fips column
    for county_fips in county['county_fips']: # iterate through all the county_fips in the county_fips column
        if tract_fips[0:5] == county_fips: # if the first 5 digits of the tract_id match the county_id
            # TO DO: Find the index of where the if statement evaluates to true, and append the 
                     county income value at that index to county_income_values_list
            # TO DO: Find the index of where the if statement evaluates to true, and append the 
                     county percent uninsured value at that index to county_percent_uninsured_values 

If there is a more efficient way to go about solving this problem, then feel free to ignore my code above.如果有更有效的方法来解决这个问题,那么请随意忽略我上面的代码。

Thanks very much in advance!首先十分感谢!

You can use the function merge .您可以使用函数merge First, you need to extract the first five digits from the column 'TRACT_FIPS' in the second dataframe.首先,您需要从第二个数据帧的'TRACT_FIPS'列中提取前五位数字。 Then you can can convert the column 'COUNTY_FIPS' to string and use both columns to merge on:然后您可以将列'COUNTY_FIPS'转换为字符串并使用两列进行合并:

left = df2['TRACT_FIPS'].astype('str').str[:5]
right = df1['COUNTY_FIPS'].astype('str')

df2.merge(df1, left_on=left, right_on=right)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM