簡體   English   中英

根據 < 或 > 條件重新編碼 R 中數據框中的列

[英]Recode a column in a data frame in R, base on < or > conditions

我有一個數據框df有 2 列,其中一列income ,另一列, level沒關系,它是分類的,但income是數字的,我也想將其重新編碼為分類,例如if income < 130000然后使用名稱income = "Less than 130000"if income < 500000 but >=130000 ,則使用名稱income = "Between 130000 and 500000" if income > 500000 but <= 2000000 ,則使用名稱income = "Between 5000000 and 20000000"

df %>%  mutate_at(vars(one_of(df$income)), 
            function(x) case_when(
              x < 130000 ~ "less than 130000",
              x <500000 ~ "between 130000 and 500000",
              x <=20000000  ~ "between 500000 and 2000000"
            )) 

但它不起作用,任何幫助表示贊賞。

這是head(df) ,請把ingresoph讀作收入在此處輸入圖像描述

請看下面,我們可以刪除對function(x)以及_at的需要

df %>%  mutate(income =  
             case_when(
              income < 130000 ~ "less than 130000",
              income <500000 ~ "between 130000 and 500000",
              income <=20000000  ~ "between 500000 and 2000000",
              T ~ as.character(NA)
            )) 

基本上只有在有特定原因時才使用mutate_at (即我想提取所有數字列或所有字符列等)

此外,如果您嘗試對任何其他外部值執行 NA,請確保將其包裝在as.character()中,因為您的 mutate 會由於不同的數據類型(邏輯和字符)而引發錯誤。

帶有條件表達式的 for 循環也可以使用 base 包完成此操作。

#written in R version 4.2.1
#example data frame
level = letters[c(1,1,2,2,3,3,3,3,4,4)]
income =   c(997413.1,1922400.2 ,488274.1,1016208.6,806846.4,100000.0,15000000.0   ,907597.5 ,810698.2 ,2057985.5)

df = data.frame(income, factor(level));df$desc = 0
for(i in 1:dim(df)[1]){
if(df$income[i] < 130000){
df$desc[i] = "less than 130000"}
if(df$income[i] >= 130000 & df$income[i] < 500000){
df$desc[i] = "Between 130000 and 500000"}
if(df$income[i] > 500000 & df$income[i] <= 2000000){
df$desc[i] = "Between 500000 and 2000000"}
if(df$desc[i] == 0){
df$desc[i] = "Other"}}
df$desc = factor(df$desc)
#

結果:

df
#       income level                       desc
#1    997413.1     a Between 500000 and 2000000
#2   1922400.2     a Between 500000 and 2000000
#3    488274.1     b  Between 130000 and 500000
#4   1016208.6     b Between 500000 and 2000000
#5    806846.4     c Between 500000 and 2000000
#6    100000.0     c           less than 130000
#7  15000000.0     c                      Other
#8    907597.5     c Between 500000 and 2000000
#9    810698.2     d Between 500000 and 2000000
#10  2057985.5     d                      Other

 str(df)
#'data.frame':   10 obs. of  3 variables:
# $ income       : num  997413 1922400 488274 1016209 806846 ...
# $ factor.level.: Factor w/ 4 levels "a","b","c","d": 1 1 2 2 3 3 3 3 4 4
# $ desc         : Factor w/ 4 levels "Between 130000 and 500000",..: 2 2 1 2 2 3 4 2 2 4

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM