简体   繁体   English

R等同于SAS的“输入”数据集选项,用于包括和排除重叠数据

[英]R equivalent to SAS's “In” data set option for including and excluding overlapping data

I'm usually a SAS user but was wondering if there was a similar way in R to list data that can only be found in one data frame after merging them. 我通常是SAS用户,但我想知道R中是否有类似的方法来列出合并后只能在一个数据框中找到的数据。 In SAS I would have used 在SAS中,我会使用

data want;
    merge have1 (In=in1) have2 (IN=in2) ;
    if not in2;
run;

to find the entries only in have1. 查找仅在have1中的条目。 My R code is: 我的R代码是:

inner <- merge(have1, have2, by= "Date", all.x = TRUE, sort = TRUE)

I've tried setdiff() and antijoin() but neither seem to give me what I want. 我已经尝试过setdiff()和antijoin(),但似乎都没有给我想要的东西。 Additionally, I would like to find a way to do the converse of this. 另外,我想找到一种方法来做到这一点。 I would like to find the entries in have1 and have2 that have the same "Date" entry and then keep the remaining variables in the 2 data frames. 我想在have1和have2中找到具有相同“日期”条目的条目,然后将其余变量保留在2个数据帧中。 For example, consider have1 with columns "Date", "ShotHeight", "ShotDistance" and have2 with columns "Date", "ThrowHeight", "ThrowDistance" so that the m]new dataframe, call it "new" has columns "Date", ShotHeight", "ShotDistance", "ThrowHeight", "ThrowDistance". 例如,考虑具有“ Date”,“ ShotHeight”,“ ShotDistance”列的have1和具有“ Date”,“ ThrowHeight”,“ ThrowDistance”列的have2,以便m] new数据帧(称为“ new”)具有“ Date”列“,ShotHeight”,“ ShotDistance”,“ ThrowHeight”,“ ThrowDistance”。

Assuming only one by-variable, the simplest solution is not to merge at all: 假设仅一个变量,最简单的解决方案就是根本不merge

want <- subset(have1, !(county %in% have2$county))

This subsets have1 to exclude rows where the value of county is in have2 . 该子集have1排除行,其中的价值countyhave2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM