[英]Recode a column in a data frame in R, base on < or > conditions
我有一個數據框df
有 2 列,其中一列income
,另一列, level
沒關系,它是分類的,但income
是數字的,我也想將其重新編碼為分類,例如if income < 130000
然后使用名稱income = "Less than 130000"
, if income < 500000 but >=130000
,則使用名稱income = "Between 130000 and 500000"
if income > 500000 but <= 2000000
,則使用名稱income = "Between 5000000 and 20000000"
df %>% mutate_at(vars(one_of(df$income)),
function(x) case_when(
x < 130000 ~ "less than 130000",
x <500000 ~ "between 130000 and 500000",
x <=20000000 ~ "between 500000 and 2000000"
))
但它不起作用,任何幫助表示贊賞。
請看下面,我們可以刪除對function(x)
以及_at
的需要
df %>% mutate(income =
case_when(
income < 130000 ~ "less than 130000",
income <500000 ~ "between 130000 and 500000",
income <=20000000 ~ "between 500000 and 2000000",
T ~ as.character(NA)
))
基本上只有在有特定原因時才使用mutate_at
(即我想提取所有數字列或所有字符列等)
此外,如果您嘗試對任何其他外部值執行 NA,請確保將其包裝在as.character()
中,因為您的 mutate 會由於不同的數據類型(邏輯和字符)而引發錯誤。
帶有條件表達式的 for 循環也可以使用 base 包完成此操作。
#written in R version 4.2.1
#example data frame
level = letters[c(1,1,2,2,3,3,3,3,4,4)]
income = c(997413.1,1922400.2 ,488274.1,1016208.6,806846.4,100000.0,15000000.0 ,907597.5 ,810698.2 ,2057985.5)
df = data.frame(income, factor(level));df$desc = 0
for(i in 1:dim(df)[1]){
if(df$income[i] < 130000){
df$desc[i] = "less than 130000"}
if(df$income[i] >= 130000 & df$income[i] < 500000){
df$desc[i] = "Between 130000 and 500000"}
if(df$income[i] > 500000 & df$income[i] <= 2000000){
df$desc[i] = "Between 500000 and 2000000"}
if(df$desc[i] == 0){
df$desc[i] = "Other"}}
df$desc = factor(df$desc)
#
結果:
df
# income level desc
#1 997413.1 a Between 500000 and 2000000
#2 1922400.2 a Between 500000 and 2000000
#3 488274.1 b Between 130000 and 500000
#4 1016208.6 b Between 500000 and 2000000
#5 806846.4 c Between 500000 and 2000000
#6 100000.0 c less than 130000
#7 15000000.0 c Other
#8 907597.5 c Between 500000 and 2000000
#9 810698.2 d Between 500000 and 2000000
#10 2057985.5 d Other
str(df)
#'data.frame': 10 obs. of 3 variables:
# $ income : num 997413 1922400 488274 1016209 806846 ...
# $ factor.level.: Factor w/ 4 levels "a","b","c","d": 1 1 2 2 3 3 3 3 4 4
# $ desc : Factor w/ 4 levels "Between 130000 and 500000",..: 2 2 1 2 2 3 4 2 2 4
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.