R 中是否有 function 可以让我创建一个包含第一个数据帧重复值的新数据帧？

Question

This is my example.这是我的例子。 From this data frame I want to create a new data frame that contains the rows that based on matches in column, mgb and column, tsg thereby omitting the other rows.从这个数据框中，我想创建一个新的数据框，其中包含基于列、mgb 和列中的匹配的行，tsg 从而省略其他行。

mbr  mbg tsr tsg
1   1   g1   3  g4
2   2   g2   4  g3
3   3   g3   5  g2
4   4   g4   6  g1
5   5   g5   7  g5
6  NA <NA>   1  g6
7  NA <NA>   2  g7

So ideally it would return this data frame:所以理想情况下它会返回这个数据框：

mbr  mbg tsr tsg
1   1   g1   3  g4
2   2   g2   4  g3
3   3   g3   5  g2
4   4   g4   6  g1
5   5   g5   7  g5

So far I've tried:到目前为止，我已经尝试过：

1) intersect(df$mbg,df$tsg) but that only returns a lists of the matches between the columns eg g1, g2 etc... 1) intersect(df$mbg,df$tsg)但只返回列之间的匹配列表，例如 g1、g2 等...

2) df2<-[intersect(df$mbg,df$tsg),] 2) df2<-[intersect(df$mbg,df$tsg),]

which returns this:它返回这个：

     mbr  mbg tsr  tsg
NA    NA <NA>  NA <NA>
NA.1  NA <NA>  NA <NA>
NA.2  NA <NA>  NA <NA>
NA.3  NA <NA>  NA <NA>
NA.4  NA <NA>  NA <NA>

I'm very new to R and trying to teach myself so any advice would be amazing.我对 R 很陌生，并试图自学，所以任何建议都会很棒。 Thank you!谢谢！

Answer 1

You don't even need the intersect piece.你甚至不需要intersect部分。

df2 <- df1[df1$mbg %in% df1$tsg, ]

The %in% operator will return a vector of TRUE / FALSE as to whether each element in mbg is found (or intersects) with values in tsg %in%运算符将返回一个TRUE / FALSE向量，用于判断tsg中的每个元素是否与mbg中的值找到（或相交）

Alternatively, using the dplyr library (which if you are new to R, I would recommend learning)或者，使用dplyr库（如果您是 R 的新手，我建议您学习）

library(dplyr)

df2 <- filter(df1, mbg %in% tsg)

Answer 2

If you would simply like to remove NA's and write to a new dataframe:如果您只想删除 NA 并写入新的 dataframe：

complete.df <- na.omit(df)

Answer 3

Assuming I'm interpreting what you're looking for correctly, you appear to be on the right track, just running into issues with syntax.假设我正确地解释了您正在寻找的内容，那么您似乎走在了正确的轨道上，只是遇到了语法问题。 Try this尝试这个

df2<-df[df$mbg %in% intersect(df$mbg,df$tsg),]

intersect(df$mbg, df$tsg) was returning the values that occur in both of those columns. intersect(df$mbg, df$tsg)返回出现在这两列中的值。 Adding df before the brackets identifies the data frame you want a subset of, which you were missing before, and the df$mbg %in% part says that you want the rows where the value of mbg is included included in the intersection.在括号之前添加 df 标识您想要的数据框的子集，您之前缺少该数据框，并且df$mbg %in%部分表示您希望包含 mbg 值的行包含在交集中。

Answer 4

library(dplyr)

df %>% 
  semi_join(df, c('mbg' = 'tsg'))

#   mbr mbg tsr tsg
# 1   1  g1   3  g4
# 2   2  g2   4  g3
# 3   3  g3   5  g2
# 4   4  g4   6  g1
# 5   5  g5   7  g5

R 中是否有 function 可以让我创建一个包含第一个数据帧重复值的新数据帧？

问题描述

4 个解决方案

解决方案1
2 2020-04-10 14:48:01

解决方案2
0 2020-04-10 14:42:03

解决方案3
0 已采纳 2020-04-10 14:42:46

解决方案4
0 2020-04-10 14:59:05

R 中是否有 function 可以让我创建一个包含第一个数据帧重复值的新数据帧？

问题描述

4 个解决方案

解决方案1 2 2020-04-10 14:48:01

解决方案2 0 2020-04-10 14:42:03

解决方案3 0 已采纳 2020-04-10 14:42:46

解决方案4 0 2020-04-10 14:59:05

解决方案1
2 2020-04-10 14:48:01

解决方案2
0 2020-04-10 14:42:03

解决方案3
0 已采纳 2020-04-10 14:42:46

解决方案4
0 2020-04-10 14:59:05