难以获得R中的子集

Question

我正在尝试对具有以下要求的数据集进行子集化：

ethnicity是xyz
education是学士学位及以上，即Bachelor's Degree或Graduate Degree
然后，我想看看满足上述条件的人的收入等级。 括号将是$30,000 - $39,999或$100,000 - $124,999 $30,000 - $39,999 。
最后，作为我的最终输出，我想看看从第三项（上）获得的子集，其中包含这些人是否信奉宗教的专栏。 在数据集中，这对应于religious not religious 。

所以看起来像这样

   income               religious
$30,000 - $39,999      not religious
$50,000 - $59,999         religious
  ....                    ....
  ....                    ....

请紧记列出的那些满足要求1和2。

请记住，我是编程新手。 我已经尝试了很长时间，并且已经浏览了许多帖子。 我似乎什么也无法工作。 我该如何解决？ 有人请帮忙。

为了不使文章变得清晰，我将在下面发布我尝试过的内容（但是请随意忽略它，因为它可能是垃圾）。

我已经尝试了以下各种变体，只是为了进入第3步，但不幸地失败了，并且即将用键盘砸我的头：

df$income[which(df$ethnicity == "xyz" & df$education %in% c("Bachelor's Degree", "Graduate Degree"), ]

我也尝试过：

race <- df$ethnicity == "xyz"
ba_ma_phd <- df$education %in% c("Graduate Degree", "Bachelor's Degree")
income_sub <- df$income[ba_ma_phd & race]

我相信 income_sub让我进入第3步，但我不知道如何将其转到第4步。

Answer 1

library(dplyr)

df %>%
  filter(ethnicity == "xyz" & 
         education %in% c("Bachelor's Degree", "Graduate Degree")) %>%
  group_by(religious) %>%
  summarize(lower_bound = min(income),
            upper_bound = max(income) )

Answer 2

更改我的评论的时间过长。

首先，您的代码已经差不多了； 由于收入是向量而不是数据框，因此不需要结尾逗号。 即你可以使用

df$income[which(df$ethnicity == "xyz" & 
         df$education %in% c("Bachelor's Degree", "Graduate Degree") ] 
 # note no comma after the closing bracket

如果要创建子集数据，则一开始不要包含df$income ，只需使用df并保持逗号即可。 这将子集您的数据，但保留所有列

sub_df <- df[which(df$ethnicity == "xyz" &
       df$education %in% c("Bachelor's Degree", "Graduate Degree"), ]

要查看子数据的income水平，可以使用table

table(sub_df$income)

您可以再次使用table检查按religious状况table的每项income的观察数。

table(sub_df$income, sub_df$religious)

如果您只想选择income和religious栏，也可以使用[

sub_df[c("religious", "income")]

难以获得R中的子集

问题描述

2 个解决方案

解决方案1
1 2015-10-04 22:22:10

解决方案2
1 已采纳 2015-10-04 22:57:57

难以获得R中的子集

问题描述

2 个解决方案

解决方案1 1 2015-10-04 22:22:10

解决方案2 1 已采纳 2015-10-04 22:57:57

解决方案1
1 2015-10-04 22:22:10

解决方案2
1 已采纳 2015-10-04 22:57:57