簡體   English   中英

重命名 R 中因子的標簽

[英]Renaming labels of a factor in R

我有按年齡組組織的男性和女性人口普查數據:

library(tidyverse)

url <- "https://www2.census.gov/programs-surveys/popest/datasets/2010-2018/counties/asrh/cc-est2018-alldata-54.csv"

if (!file.exists("./datafiles/cc-est2018-alldata-54.csv"))
  download.file(url, destfile = "./datafiles/cc-est2018-alldata-54.csv", mode = "wb")

popSample <- read.csv("./datafiles/cc-est2018-alldata-54.csv") %>%
  filter(AGEGRP != 0 & YEAR == 1) %>%
  select("STNAME", "CTYNAME", "AGEGRP", "TOT_POP", "TOT_MALE", "TOT_FEMALE")

popSample$AGEGRP <- as.factor(popSample$AGEGRP)

然后我繪制男性和女性人口關系,按年齡組(1-18 歲,目前被視為整數

g <- ggplot(popSample, aes(x=TOT_MALE, y=TOT_FEMALE)) +
  geom_point(alpha = 0.5, colour="darkblue") +
  scale_x_log10() +
  scale_y_log10() +
  facet_wrap(~AGEGRP) +
  stat_smooth(method = "lm", col = "darkred", size=.75) +
  labs(title = "F vs. M Population across all Age Groups", x = "Total Male (log10)", y = "Total Female (log10)") +
  theme_light()

g

這導致了這個情節: https : //share.getcloudapp.com/v1ur6O4e 陰謀

問題:我試圖將列 AGEGRP 從“int”轉換為“factor”,並將因子標簽從“1”、“2”、“3”、……“18”更改為“AgeGroup1”、“AgeGroup2” , "AgeGroup3", ... "AgeGroup18"

當我嘗試這段代碼時,我的 AGEGRP 列的觀察值全部替換為 NA: popSample$AGEGRP <- factor(popSample$AGEGRP, levels = c("0 to 4", "5 to 9", "10 to 14", "15 to 19", "20 to 24", "25 to 29", "30 to 34", "35 to 39", "40 to 44", "45 to 49", "50 to 54", "55 to 59", "60 to 64", "65 to 69", "70 to 74", "75 to 79", "80 to 84", "85+"))

https://share.getcloudapp.com/qGuo1O4y

感謝您的幫助,

popSample$AGEGRP <- factor( popSample$AGEGRP, levels = c("0 to 4", "5 to 9", "10 to 14", "15 to 19", "20 to 24", "25 to 29", "30 to 34", "35 to 39", "40 to 44", "45 to 49", "50 to 54", "55 to 59", "60 to 64", "65 to 69", "70 to 74", "75 to 79", "80 to 84", "85+"))

雖然需要添加所有級別。

或者

levels(popSample$AGEGRP) <- c("0 to 4", "5 to 9", "10 to 14", "15 to 19", "20 to 24", "25 to 29", "30 to 34", "35 to 39", "40 to 44", "45 to 49", "50 to 54", "55 to 59", "60 to 64", "65 to 69", "70 to 74", "75 to 79", "80 to 84", "85+")

也應該工作。

再次讀取 csv:

library(tidyverse)

url <- "https://www2.census.gov/programs-surveys/popest/datasets/2010-2018/counties/asrh/cc-est2018-alldata-54.csv"
popSample <- read.csv(url) %>%
filter(AGEGRP != 0 & YEAR == 1) %>%
select("STNAME", "CTYNAME", "AGEGRP", "TOT_POP", "TOT_MALE", "TOT_FEMALE")

如果您只想在構面標簽中添加前綴“AgeGroup”,您可以:

ggplot(popSample, aes(x=TOT_MALE, y=TOT_FEMALE)) +
  geom_point(alpha = 0.5, colour="darkblue") +
  scale_x_log10() +
  scale_y_log10() +
  facet_wrap(~AGEGRP,labeller=labeller(AGEGRP = function(i)paste0("AgeGroup",i))) +
  stat_smooth(method = "lm", col = "darkred", size=.75) +
  labs(title = "F vs. M Population across all Age Groups", 
  x = "Total Male (log10)", y = "Total Female (log10)") +
  theme_light()

在此處輸入圖片說明

如果需要新的因素,那么您需要重構(如下面@Annet 的回答):

lvls = c("0 to 4", "5 to 9", "10 to 14", "15 to 19", 
"20 to 24", "25 to 29", "30 to 34", "35 to 39", 
"40 to 44", "45 to 49", "50 to 54", "55 to 59",
 "60 to 64", "65 to 69", "70 to 74", "75 to 79", "80 to 84", "85+")
#because you have factorize it
# if you can read the csv again, skip the factorization
popSample$AGEGRP = factor(lvls[popSample$AGEGRP],levels=lvls)

然后情節:

ggplot(popSample, aes(x=TOT_MALE, y=TOT_FEMALE)) +
      geom_point(alpha = 0.5, colour="darkblue") +
      scale_x_log10() +
      scale_y_log10() +
      facet_wrap(~AGEGRP) +
      stat_smooth(method = "lm", col = "darkred", size=.75) +
      labs(title = "F vs. M Population across all Age Groups", 
      x = "Total Male (log10)", y = "Total Female (log10)") +
      theme_light()

在此處輸入圖片說明

要使用一個函數更改所有因子標簽,您可以使用forcats::fct_relabelforcats作為forcats一部分提供,您已經加載了它)。 更改的因子標簽將延續到情節方面,並且順序保持不變。

前幾個條目:

# before relabelling
popSample$AGEGRP[1:4]
#> [1] 1 2 3 4
#> Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

# after relabelling
forcats::fct_relabel(popSample$AGEGRP, ~paste0("AgeGroup", .))[1:4]
#> [1] AgeGroup1 AgeGroup2 AgeGroup3 AgeGroup4
#> 18 Levels: AgeGroup1 AgeGroup2 AgeGroup3 AgeGroup4 AgeGroup5 ... AgeGroup18

或者使用基礎 R,重新分配級別:

levels(popSample$AGEGRP) <- paste0("AgeGroup", levels(popSample$AGEGRP))
popSample$AGEGRP[1:4]
#> [1] AgeGroup1 AgeGroup2 AgeGroup3 AgeGroup4
#> 18 Levels: AgeGroup1 AgeGroup2 AgeGroup3 AgeGroup4 AgeGroup5 ... AgeGroup18

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM