简体   繁体   English

将数据框的一列与另一列匹配,拉入其他列,合并为大型数据集

[英]Match one column of data frame to another, pull in other columns, combine into large dataset

I've got a list of Store IDs and their Zipcodes in a 2 column numeric vector (in R). 我在2列数字向量中(在R中)有一个商店ID及其邮政编码的列表。 I'm using the "Zipcode" package ( https://cran.rproject.org/web/packages/zipcode/zipcode.pdf ) and have access to the longitude/latitude coordinates for these zipcodes. 我正在使用“邮政编码”包( https://cran.rproject.org/web/packages/zipcode/zipcode.pdf ),并可以访问这些邮政编码的经度/纬度坐标。 The zipcode package has a data frame with every zip code, city,state, and longitude and latitude for all the zipcodes (as a large dataframe). 邮政编码包具有一个数据框,其中包含所有邮政编码的每个邮政编码,城市,州以及经度和纬度(作为大型数据框)。

I'm looking to get the longitude and latitude coordinates for my Zipcodes, and add them as columns 3 and 4 (ie Store ID, Zip Code, Longtitude, Latitude) 我正在寻找邮政编码的经度和纬度坐标,并将它们添加为第3列和第4列(即商店ID,邮政编码,经度,纬度)

Any thoughts? 有什么想法吗? Thank you! 谢谢!

EDIT: I've tried the merge function (ie) total<-merged(CleanData,zipcode, by=zip) and I'm getting an error because they must have the same number of columns? 编辑:我已经尝试了合并功能(即)total <-merged(CleanData,zipcode,by = zip),但由于它们必须具有相同的列数,因此我收到了错误消息?

The column name passed as the by argument has to be enclosed within quotes. 作为by参数传递的列名必须用引号引起来。 You don't need the by argument in merge in this example, if zipcode is the only common column in the two dataframes. 如果邮政编码是两个数据帧中唯一的公共列, 在此示例中,不需要在合并中使用by参数。

Example datasets: 示例数据集:

#cleanData
d1<-tibble::tribble(~z,~id,131,1,114,2,155,5)

#zipcode
d2<-
tibble::tribble(~z,~x,~y,131,2,5,166,2,6,162,6,5,177,7,1,114,2,1,155,5,9)

result <- merge(d1,d2)

gives

       z id x y
    1 114  2 2 1
    2 131  1 2 5
    3 155  5 5 9

You can remove any unnecessary columns from the result dataframe by simply using dplyr::select() . 您只需使用dplyr::select()即可从结果数据dplyr::select()删除任何不必要的列。 Suppose you don't need column y (which may be a state name, for example) 假设您不需要y列(例如,可以是州名)

result <- dplyr::select(result, z, id, x)

Ended up using this: How to join (merge) data frames (inner, outer, left, right)? 最终使用此方法: 如何连接(合并)数据框(内部,外部,左侧,右侧)?

essentially I used the Left Outer function because I wanted to keep all of the zipcodes in my store database. 本质上,我使用了Left Outer函数,因为我想将所有邮政编码保留在商店数据库中。 I believe the answer above would eliminate zipcodes not found in the second list of zipcodes. 我相信以上答案将消除在第二个邮政编码列表中找不到的邮政编码。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将data.frame的一列与另一data.frame中的所有列进行匹配 - Match one column of a data.frame with all the columns in another data.frame 检查一列数据框与 R 中多个其他列之一的匹配 - Check for match in one column of data frame to one of multiple other columns in R 如果前两列都匹配,则将数据框的一列中的值添加到另一数据框的新列中 - adding values from one column of a data frame into a new column of another dataframe if the first two columns in both match 匹配另一列中的模式后替换数据框列中的值 - Replace values in columns of a data frame after match a pattern in another column 如何将一个数据帧的两列与另一数据帧的一列匹配 - how to match two column of one data frame with one column of other data frame 在另一个数据框的列上匹配一个数据框的列,如果匹配则添加一个新列 - Matching a column from a data frame on the columns of another data frame and if they match add a new column 如何在数据框的其他列中的一列中搜索字符串 - How to search for a string in one column in other columns of a data frame 根据R数据帧中的其他列对一列执行计算 - Perform calculation on one column based on other columns in an R data frame 从一个数据框的不同列创建一个新列,该条件以另一个数据框的另一列为条件 - Create a new column from different columns of one data frame conditioned on another column from another data frame 通过根据另一个数据框中列的值从一个数据框中提取列来创建新数据框 - creating a new data frame by extracting columns from one data frame based on the value of column in another data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM