简体   繁体   English

连接具有不同观察数量的两个数据集

[英]Joining two datasets with different number of observations

I have two datasets that contain the population and the longitude/latitude of subregions.我有两个数据集,其中包含人口和子区域的经度/纬度。 Since there are several geographical entries per subregion, I want to join these datasets in a way that the population is displayed in every single corresponding geographical row.由于每个子区域有几个地理条目,我想以一种将人口显示在每个对应的地理行中的方式加入这些数据集。 I have tried every dplyr command (inner_join, full_join etc) but can't make it work.我已经尝试了每个 dplyr 命令(inner_join、full_join 等)但无法使其工作。 Any help on this is greatly appreciated!非常感谢您对此的任何帮助!

Dataset 1 (100 observations)数据集 1(100 个观测值)

subregion    population 

adams        66949
alexander     7051
bond         17137
...          ...

Dataset 2 (10000 observations)数据集 2(10000 个观测值)

subregion   longitude   latitude

adams       -91.49563    40.21018
adams       -90.91121    40.19299
adams       -90.91694    39.75754
alexander   -89.20380    37.32247
...         ...          ...

Desired dataset所需数据集

subregion   longitude   latitude   population

adams       -91.49563    40.21018  66949
adams       -90.91121    40.19299  66949
adams       -90.91694    39.75754  66949
alexander   -89.20380    37.32247   7051
...         ...          ...
#library(tibble) # uncomment if needed to access tribble()

B <- tribble(  # Using tribble to make quick, easy df's from your post
  ~subregion,    ~population ,
  "adams",        66949,
  "alexander",     7051,
  "bond",         17137)

A <- tribble(
  ~subregion,   ~longitude,   ~latitude,
  "adams",       -91.49563,    40.21018,
  "adams",       -90.91121,    40.19299,
  "adams",       -90.91694,    39.75754,
  "alexander",   -89.20380,    37.32247)

merge(A,B,by="subregion")

#> merge(A,B,by="subregion")
#  subregion longitude latitude population

#1     adams -91.49563 40.21018      66949
#2     adams -90.91121 40.19299      66949
#3     adams -90.91694 39.75754      66949
#4 alexander -89.20380 37.32247       7051

NOTE: Bond is dropped.注意:债券被丢弃。 Use利用

merge(A,B,by="subregion",all.x=TRUE,all.y=TRUE) 

if bond's pop is required but lat/long not (vals will be set to NA).如果需要债券的弹出,但不需要经纬度(vals 将设置为 NA)。

One may keep all from the first frame or the second frame or both (padding with NAs) with the various combos of all.x/all.y set to TRUE or FALSE.可以将 all.x/all.y 的各种组合设置为 TRUE 或 FALSE,保留第一帧或第二帧或两者(用 NA 填充)的所有内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM