简体   繁体   English

如何使用另一个R数据帧的值对一个R数据帧进行子集化?

[英]How to subset one R dataframe with the values of another R dataframe?

I have two dataframes in R: 我在R中有两个数据帧:

Died.At <- c(22,40,72,41, ...)
Writer.At <- c(16, 18, 36, 36)
Name <- c("John Doe", "Edgar Poe", "Walt Whitman", "Jane Austen", ...)
Gender <- c("MALE", "MALE", "MALE", "FEMALE", ...)
Date.Of.Death <- c("2015-05-10", "1849-10-07", "1892-03-26","1817-07-18", ...)
Pet <- c("cat", "dog", "cat", "cat")
df1 = data.frame(Died.At, Writer.At, Name, Gender, Pet)
print(df1)
  Died.At Writer.At     Name          Gender    Pet
1      22        16     John Doe      MALE      cat
2      40        18     Edgar Poe     MALE      dog
3      72        36     Walt Whitman  MALE      cat
4      41        36     Jane Austen   FEMALE    cat
.....

In df1 not each row for Name is unique (ie there are several rows with the same author.) df1Name每一行都不是唯一的(即有多个行具有相同的作者。)

The second dataframe df2 , there is also a column Name with both authors from df1 (eg Jane Austen) and completely new authors. 第二个数据帧df2 ,还有一个Name其中包含来自df1两位作者(例如Jane Austen)和全新的作者。 This dataframe is also far larger. 这个数据框架也要大得多。

print(length(unique(df1$Name)))
## output 1168
print(length(unique(df2$Name)))
## output 5572

I would like to subset df2 such that the only names are the names from df1 . 我想将df2子集化,使得唯一的名称是来自df1的名称。

My idea was to do this: 我的想法是这样做:

subset_df2 = df2[df2$Name == unique(df1$Name)]

However, I would expect there to be 1168 unique author names here: 但是,我希望这里有1168个独特的作者姓名:

print(length(unique(subset_df2$Name)))
## output 880

That's less than I was expecting. 这比我预期的要少。 Where is my error? 我的错误在哪里?

You can use match(df2$Name, df1$Name) or df2$Name %in% df1$Name which return a vector of logicals the length of df2$Name, and logical TRUE where df2$Name is in df1$Name. 您可以df2$Name %in% df1$Name中使用match(df2$Name, df1$Name)df2$Name %in% df1$Name ,返回df2 $ Name长度的逻辑向量,逻辑TRUE,其中df2 $ Name在df1 $ Name中。 You can then use this to index df2. 然后,您可以使用它来索引df2。

subset_df2 <- df2[df2$Name %in% df1$Name, ]

See ?match ?match

As for why your code did not work, please see the output of this exercise: 至于为什么你的代码不起作用,请参阅本练习的输出:

a = LETTERS[sample(1:10, size=15, replace=T)]
b = c(unique(a), LETTERS[15:30])
# compare
b == unique(a) 
b[b == unique(a)]
# vs
b %in% a
b[b %in% a]

Note also b %in% a is not equivalent to a %in% b and therefore b[a %in% b] would yield an incorrect result. 另请注意b %in% a不等于a %in% b ,因此b[a %in% b]将产生不正确的结果。

Furthermore, when indexing a data frame you need to provide a row range and column range. 此外,在索引数据框时,您需要提供行范围和列范围。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 匹配/子集一个 dataframe 基于另一个 dataframe 中的条件值在 R - Match/subset one dataframe based on conditional values in another dataframe in R r - 如何基于另一个 dataframe 对 dataframe 进行子集化 - r - How to subset a dataframe based on another dataframe 如何根据dataframe1中的值从dataframe2子集并将所有子集堆叠到R中的一个数据帧中? - How to subset from dataframe2 depending on the values in dataframe1 and stack all subsets in one dataframe in R? 使用 R 数据帧中的值作为索引来子集和汇总另一个数据帧? - Use values in R dataframe as index to subset and summarize another dataframe? 通过R中另一个数据框的值的唯一组合来子集一个数据框 - Subset a dataframe by unique combination of values from another dataframe in R 如何通过R中另一个因子的因子级别对数据帧进行子集化? - How to subset a dataframe by factor levels of another in R? 如何根据另一个数据对 R 中的数据帧进行子集化 - How to subset dataframe in R based on another data 如何检查一个 dataframe 中的值是否存在于另一个 dataframe 中的 R 中? - How to check if values in one dataframe exist in another dataframe in R? 将值分配给R中的数据框子集 - asssign values to dataframe subset in R 如何基于R中的另一个数据框过滤和子集数据框 - How to filter and subset a dataframe based on another dataframe in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM