[英]Join Values from Upper-Level Aggregates to Lower-Level Aggregates in a Pandas Data Frame
I have two Pandas data frames.我有两个 Pandas 数据框。
The first data frame ( county
) has county-level data -第一个数据框(
county
)有县级数据——
COUNTY_FIPS COUNTY_INCOME COUNTY_PERCENT_UNINSURED
51001 42260 16.7
51003 72265 7.6
The second data frame ( tract
) has Census tract-level data -第二个数据框 (
tract
) 具有人口普查区域级别的数据 -
TRACT_FIPS TRACT_INCOME TRACT_PERCENT_UNINSURED
51001090100 48861 13.4
51001090200 42663 9.4
51003090300 32532 19.7
51003090100 55678 12.1
I would like to join values from upper-level aggregates (county-level data) to the lower-level aggregates (Census tract-level data).我想将上级聚合(县级数据)的值连接到下级聚合(人口普查区级数据)。 Note that the first five numbers of the TRACT_FIPS correspond to which county those Census tracts are in (see COUNTY_FIPS).
请注意,TRACT_FIPS 的前五个数字对应于这些人口普查区所在的县(请参阅 COUNTY_FIPS)。 My final data frame would look like this -
我的最终数据框看起来像这样 -
TRACT_FIPS TRACT_INCOME TRACT_PERCENT_UNINSURED COUNTY_INCOME COUNTY_PERCENT_UNINSURED
51001090100 48861 13.4 42260 16.7
51001090200 42663 9.4 42260 16.7
51003090300 32532 19.7 72265 7.6
51003090100 55678 12.1 72265 7.6
Here's what I have programmed so far (with some pseudocode) -这是我到目前为止编写的程序(带有一些伪代码)-
county_income_values = [] # empty list of county income values
county_percent_uninsured_values # empty list of county percent uninsured values
for tract_fips in tract['tract_fips']: # iterate through all the tract_fips in the tract_fips column
for county_fips in county['county_fips']: # iterate through all the county_fips in the county_fips column
if tract_fips[0:5] == county_fips: # if the first 5 digits of the tract_id match the county_id
# TO DO: Find the index of where the if statement evaluates to true, and append the
county income value at that index to county_income_values_list
# TO DO: Find the index of where the if statement evaluates to true, and append the
county percent uninsured value at that index to county_percent_uninsured_values
If there is a more efficient way to go about solving this problem, then feel free to ignore my code above.如果有更有效的方法来解决这个问题,那么请随意忽略我上面的代码。
Thanks very much in advance!首先十分感谢!
You can use the function merge
.您可以使用函数
merge
。 First, you need to extract the first five digits from the column 'TRACT_FIPS'
in the second dataframe.首先,您需要从第二个数据帧的
'TRACT_FIPS'
列中提取前五位数字。 Then you can can convert the column 'COUNTY_FIPS'
to string and use both columns to merge on:然后您可以将列
'COUNTY_FIPS'
转换为字符串并使用两列进行合并:
left = df2['TRACT_FIPS'].astype('str').str[:5]
right = df1['COUNTY_FIPS'].astype('str')
df2.merge(df1, left_on=left, right_on=right)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.