[英]Is there a function in R that will let me create a new data frame that contains the duplicated values from the first data frame?
This is my example.这是我的例子。 From this data frame I want to create a new data frame that contains the rows that based on matches in column, mgb and column, tsg thereby omitting the other rows.
从这个数据框中,我想创建一个新的数据框,其中包含基于列、mgb 和列中的匹配的行,tsg 从而省略其他行。
mbr mbg tsr tsg
1 1 g1 3 g4
2 2 g2 4 g3
3 3 g3 5 g2
4 4 g4 6 g1
5 5 g5 7 g5
6 NA <NA> 1 g6
7 NA <NA> 2 g7
So ideally it would return this data frame:所以理想情况下它会返回这个数据框:
mbr mbg tsr tsg
1 1 g1 3 g4
2 2 g2 4 g3
3 3 g3 5 g2
4 4 g4 6 g1
5 5 g5 7 g5
So far I've tried:到目前为止,我已经尝试过:
1) intersect(df$mbg,df$tsg)
but that only returns a lists of the matches between the columns eg g1, g2 etc... 1)
intersect(df$mbg,df$tsg)
但只返回列之间的匹配列表,例如 g1、g2 等...
2) df2<-[intersect(df$mbg,df$tsg),]
2)
df2<-[intersect(df$mbg,df$tsg),]
which returns this:它返回这个:
mbr mbg tsr tsg
NA NA <NA> NA <NA>
NA.1 NA <NA> NA <NA>
NA.2 NA <NA> NA <NA>
NA.3 NA <NA> NA <NA>
NA.4 NA <NA> NA <NA>
I'm very new to R and trying to teach myself so any advice would be amazing.我对 R 很陌生,并试图自学,所以任何建议都会很棒。 Thank you!
谢谢!
You don't even need the intersect
piece.你甚至不需要
intersect
部分。
df2 <- df1[df1$mbg %in% df1$tsg, ]
The %in%
operator will return a vector of TRUE
/ FALSE
as to whether each element in mbg
is found (or intersects) with values in tsg
%in%
运算符将返回一个TRUE
/ FALSE
向量,用于判断tsg
中的每个元素是否与mbg
中的值找到(或相交)
Alternatively, using the dplyr
library (which if you are new to R, I would recommend learning)或者,使用
dplyr
库(如果您是 R 的新手,我建议您学习)
library(dplyr)
df2 <- filter(df1, mbg %in% tsg)
If you would simply like to remove NA's and write to a new dataframe:如果您只想删除 NA 并写入新的 dataframe:
complete.df <- na.omit(df)
Assuming I'm interpreting what you're looking for correctly, you appear to be on the right track, just running into issues with syntax.假设我正确地解释了您正在寻找的内容,那么您似乎走在了正确的轨道上,只是遇到了语法问题。 Try this
尝试这个
df2<-df[df$mbg %in% intersect(df$mbg,df$tsg),]
intersect(df$mbg, df$tsg)
was returning the values that occur in both of those columns. intersect(df$mbg, df$tsg)
返回出现在这两列中的值。 Adding df before the brackets identifies the data frame you want a subset of, which you were missing before, and the df$mbg %in%
part says that you want the rows where the value of mbg is included included in the intersection.在括号之前添加 df 标识您想要的数据框的子集,您之前缺少该数据框,并且
df$mbg %in%
部分表示您希望包含 mbg 值的行包含在交集中。
library(dplyr)
df %>%
semi_join(df, c('mbg' = 'tsg'))
# mbr mbg tsr tsg
# 1 1 g1 3 g4
# 2 2 g2 4 g3
# 3 3 g3 5 g2
# 4 4 g4 6 g1
# 5 5 g5 7 g5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.