简体   繁体   English

难以获得R中的子集

[英]Having difficulty obtaining subset in R

I'm trying to subset a data set with the following requirements: 我正在尝试对具有以下要求的数据集进行子集化:

  1. ethnicity is xyz ethnicityxyz
  2. education is Bachelor's Degree and above, ie Bachelor's Degree or Graduate Degree education是学士学位及以上,即Bachelor's DegreeGraduate Degree
  3. I then want to look at the income bracket of those who meet the above requirements. 然后,我想看看满足上述条件的人的收入等级。 A bracket would be something like $30,000 - $39,999 , or $100,000 - $124,999 . 括号将是$30,000 - $39,999$100,000 - $124,999 $30,000 - $39,999
  4. Finally, as my final output, I want to see the subset obtained from the third item (above) with the column of whether or not those individuals are religious. 最后,作为我的最终输出,我想看看从第三项(上)获得的子集,其中包含这些人是否信奉宗教的专栏。 In the data set, that corresponds to religious and not religious . 在数据集中,这对应于religious not religious

So it would look something like this 所以看起来像这样

   income               religious
$30,000 - $39,999      not religious
$50,000 - $59,999         religious
  ....                    ....
  ....                    ....

Keeping mind those listed satisfy requirements 1 and 2. 请紧记列出的那些满足要求1和2。

Please bear in mind that I am new to programming. 请记住,我是编程新手。 I've tried to figure this out for a very long time and have dug through many posts. 我已经尝试了很长时间,并且已经浏览了许多帖子。 I can't seem to get anything to work. 我似乎什么也无法工作。 How do I fix this? 我该如何解决? Someone please help. 有人请帮忙。


So as to not take away from the clarity of the post, I'll post what I've tried below (but feel free to ignore it as it's probably rubbish). 为了不使文章变得清晰,我将在下面发布我尝试过的内容(但是请随意忽略它,因为它可能是垃圾)。

I have tried many variations of the following just to get to step 3, but have failed miserably, and am about to bash my head with the keyboard: 我已经尝试了以下各种变体,只是为了进入第3步,但不幸地失败了,并且即将用键盘砸我的头:

df$income[which(df$ethnicity == "xyz" & df$education %in% c("Bachelor's Degree", "Graduate Degree"), ]

I've also tried: 我也尝试过:

race <- df$ethnicity == "xyz"
ba_ma_phd <- df$education %in% c("Graduate Degree", "Bachelor's Degree")
income_sub <- df$income[ba_ma_phd & race]

I believe income_sub gets me up to step 3, but I have no idea how to get it to step 4. 相信 income_sub让我进入第3步,但我不知道如何将其转到第4步。

library(dplyr)

df %>%
  filter(ethnicity == "xyz" & 
         education %in% c("Bachelor's Degree", "Graduate Degree")) %>%
  group_by(religious) %>%
  summarize(lower_bound = min(income),
            upper_bound = max(income) )

Change my comment as it was a bit too long. 更改我的评论的时间过长。

First your code, you are almost there; 首先,您的代码已经差不多了; as income is a vector rather than a dataframe, you do not need the trailing comma. 由于收入是向量而不是数据框,因此不需要结尾逗号。 ie you can use 即你可以使用

df$income[which(df$ethnicity == "xyz" & 
         df$education %in% c("Bachelor's Degree", "Graduate Degree") ] 
 # note no comma after the closing bracket

If you want to create a subsetted data, then do not include df$income at the start, just use df and keep the comma this time. 如果要创建子集数据,则一开始不要包含df$income ,只需使用df并保持逗号即可。 This will subset your data, but keep all columns 这将子集您的数据,但保留所有列

sub_df <- df[which(df$ethnicity == "xyz" &
       df$education %in% c("Bachelor's Degree", "Graduate Degree"), ]

To then look at the income levels for the subsetted data, you can use table 要查看子数据的income水平,可以使用table

table(sub_df$income)

You can again use table to examine the counts of observations for each income by religious status. 您可以再次使用table检查按religious状况table的每项income的观察数。

table(sub_df$income, sub_df$religious)

If you just want to select the income and religious columns you can also do this using [ 如果您只想选择incomereligious栏,也可以使用[

sub_df[c("religious", "income")]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM