難以獲得R中的子集

Question

我正在嘗試對具有以下要求的數據集進行子集化：

ethnicity是xyz
education是學士學位及以上，即Bachelor's Degree或Graduate Degree
然后，我想看看滿足上述條件的人的收入等級。 括號將是$30,000 - $39,999或$100,000 - $124,999 $30,000 - $39,999 。
最后，作為我的最終輸出，我想看看從第三項（上）獲得的子集，其中包含這些人是否信奉宗教的專欄。 在數據集中，這對應於religious not religious 。

所以看起來像這樣

   income               religious
$30,000 - $39,999      not religious
$50,000 - $59,999         religious
  ....                    ....
  ....                    ....

請緊記列出的那些滿足要求1和2。

請記住，我是編程新手。 我已經嘗試了很長時間，並且已經瀏覽了許多帖子。 我似乎什么也無法工作。 我該如何解決？ 有人請幫忙。

為了不使文章變得清晰，我將在下面發布我嘗試過的內容（但是請隨意忽略它，因為它可能是垃圾）。

我已經嘗試了以下各種變體，只是為了進入第3步，但不幸地失敗了，並且即將用鍵盤砸我的頭：

df$income[which(df$ethnicity == "xyz" & df$education %in% c("Bachelor's Degree", "Graduate Degree"), ]

我也嘗試過：

race <- df$ethnicity == "xyz"
ba_ma_phd <- df$education %in% c("Graduate Degree", "Bachelor's Degree")
income_sub <- df$income[ba_ma_phd & race]

我相信 income_sub讓我進入第3步，但我不知道如何將其轉到第4步。

Answer 1

library(dplyr)

df %>%
  filter(ethnicity == "xyz" & 
         education %in% c("Bachelor's Degree", "Graduate Degree")) %>%
  group_by(religious) %>%
  summarize(lower_bound = min(income),
            upper_bound = max(income) )

Answer 2

更改我的評論的時間過長。

首先，您的代碼已經差不多了； 由於收入是向量而不是數據框，因此不需要結尾逗號。 即你可以使用

df$income[which(df$ethnicity == "xyz" & 
         df$education %in% c("Bachelor's Degree", "Graduate Degree") ] 
 # note no comma after the closing bracket

如果要創建子集數據，則一開始不要包含df$income ，只需使用df並保持逗號即可。 這將子集您的數據，但保留所有列

sub_df <- df[which(df$ethnicity == "xyz" &
       df$education %in% c("Bachelor's Degree", "Graduate Degree"), ]

要查看子數據的income水平，可以使用table

table(sub_df$income)

您可以再次使用table檢查按religious狀況table的每項income的觀察數。

table(sub_df$income, sub_df$religious)

如果您只想選擇income和religious欄，也可以使用[

sub_df[c("religious", "income")]

難以獲得R中的子集

問題描述

2 個解決方案

解決方案1
1 2015-10-04 22:22:10

解決方案2
1 已采納 2015-10-04 22:57:57

難以獲得R中的子集

問題描述

2 個解決方案

解決方案1 1 2015-10-04 22:22:10

解決方案2 1 已采納 2015-10-04 22:57:57

解決方案1
1 2015-10-04 22:22:10

解決方案2
1 已采納 2015-10-04 22:57:57