简体   繁体   English

将 data.frame 列格式从字符转换为因子

[英]Convert data.frame column format from character to factor

I would like to change the format (class) of some columns of my data.frame object ( mydf ) from charactor to factor .我想将我的 data.frame 对象( mydf )的某些列的格式(类)从charactor更改为factor

I don't want to do this when I'm reading the text file by read.table() function.当我通过read.table()函数读取文本文件时,我不想这样做。

Any help would be appreciated.任何帮助,将不胜感激。

Hi welcome to the world of R.嗨,欢迎来到 R 的世界。

mtcars  #look at this built in data set
str(mtcars) #allows you to see the classes of the variables (all numeric)

#one approach it to index with the $ sign and the as.factor function
mtcars$am <- as.factor(mtcars$am)
#another approach
mtcars[, 'cyl'] <- as.factor(mtcars[, 'cyl'])
str(mtcars)  # now look at the classes

This also works for character, dates, integers and other classes这也适用于字符、日期、整数和其他类

Since you're new to R I'd suggest you have a look at these two websites:由于您是 R 的新手,我建议您查看以下两个网站:

R reference manuals: http://cran.r-project.org/manuals.html R 参考手册: http : //cran.r-project.org/manuals.html

R Reference card: http://cran.r-project.org/doc/contrib/Short-refcard.pdf R 参考卡: http : //cran.r-project.org/doc/contrib/Short-refcard.pdf

# To do it for all names
df[] <- lapply( df, factor) # the "[]" keeps the dataframe structure
 col_names <- names(df)
# to do it for some names in a vector named 'col_names'
df[col_names] <- lapply(df[col_names] , factor)

Explanation.解释。 All dataframes are lists and the results of [ used with multiple valued arguments are likewise lists, so looping over lists is the task of lapply .所有数据帧都是列表,与多个值参数一起使用的[的结果也是列表,因此循环列表是lapply的任务。 The above assignment will create a set of lists that the function data.frame.[<- should successfully stick back into into the dataframe, df上面的赋值将创建一组列表,函数data.frame.[<-应该成功地回到数据帧中, df

Another strategy would be to convert only those columns where the number of unique items is less than some criterion, let's say fewer than the log of the number of rows as an example:另一种策略是仅转换那些唯一项数小于某个标准的列,例如比行数的对数少:

cols.to.factor <- sapply( df, function(col) length(unique(col)) < log10(length(col)) )
df[ cols.to.factor] <- lapply(df[ cols.to.factor] , factor)

You could use dplyr::mutate_if() to convert all character columns or dplyr::mutate_at() for select named character columns to factors:您可以使用dplyr::mutate_if()将所有字符列或dplyr::mutate_at()用于选择命名字符列转换为因子:

library(dplyr)

# all character columns to factor:
df <- mutate_if(df, is.character, as.factor)

# select character columns 'char1', 'char2', etc. to factor:
df <- mutate_at(df, vars(char1, char2), as.factor)

If you want to change all character variables in your data.frame to factors after you've already loaded your data, you can do it like this, to a data.frame called dat :如果您想在加载数据后将 data.frame 中的所有字符变量更改为 factor ,您可以像这样对名为dat的 data.frame 执行以下操作:

character_vars <- lapply(dat, class) == "character"
dat[, character_vars] <- lapply(dat[, character_vars], as.factor)

This creates a vector identifying which columns are of class character , then applies as.factor to those columns.这将创建一个向量,标识哪些列属于character类,然后将as.factor应用于这些列。

Sample data:样本数据:

dat <- data.frame(var1 = c("a", "b"),
                  var2 = c("hi", "low"),
                  var3 = c(0, 0.1),
                  stringsAsFactors = FALSE
                  )

Another short way you could use is a pipe ( %<>% ) from the magrittr package.您可以使用的另一种简短方法是来自magrittr包的管道 ( %<>% )。 It converts the character column mycolumn to a factor.它将字符列mycolumn转换为一个因子。

library(magrittr)

mydf$mycolumn %<>% factor

I've doing it with a function.我用一个函数来做。 In this case I will only transform character variables to factor:在这种情况下,我只会将字符变量转换为因子:

for (i in 1:ncol(data)){
    if(is.character(data[,i])){
        data[,i]=factor(data[,i])
    }
}

You can use across with new dplyr 1.0.0您可以使用acrossdplyr 1.0.0

library(dplyr)

df <- mtcars 
#To turn 1 column to factor
df <- df %>% mutate(cyl = factor(cyl))

#Turn columns to factor based on their type. 
df <- df %>% mutate(across(where(is.character), factor))

#Based on the position
df <- df %>% mutate(across(c(2, 4), factor))

#Change specific columns by their name
df <- df %>% mutate(across(c(cyl, am), factor))

Unless you need to identify the columns automatically, I found this to be the simplest solution:除非您需要自动识别列,否则我发现这是最简单的解决方案:

df$name <- as.factor(df$name)

This makes column name in dataframe df a factor.这使得数据帧df的列name成为一个因素。

We can also use modify_if function from purrr .我们也可以使用来自purrr modify_if函数。 It will take a predicate function .p and apply it on every element of our data set and apply the function .f where the predicate results in a single TRUE .它将采用谓词函数.p并将其应用于我们数据集的每个元素,并应用函数.f ,其中谓词导致单个TRUE

  • I used modify_if as it preserves the input type and returns an output of the same type我使用modify_if因为它保留输入类型并返回相同类型的输出
  • Another variation is map_if另一个变体是map_if
starwars %>% modify_if(~ is.character(.x), ~ factor(.x))

# A tibble: 87 x 14
   name   height  mass hair_color skin_color eye_color birth_year sex   gender homeworld species
   <fct>   <int> <dbl> <fct>      <fct>      <fct>          <dbl> <fct> <fct>  <fct>     <fct>  
 1 Luke ~    172    77 blond      fair       blue            19   male  mascu~ Tatooine  Human  
 2 C-3PO     167    75 NA         gold       yellow         112   none  mascu~ Tatooine  Droid  
 3 R2-D2      96    32 NA         white, bl~ red             33   none  mascu~ Naboo     Droid  
 4 Darth~    202   136 none       white      yellow          41.9 male  mascu~ Tatooine  Human  
 5 Leia ~    150    49 brown      light      brown           19   fema~ femin~ Alderaan  Human  
 6 Owen ~    178   120 brown, gr~ light      blue            52   male  mascu~ Tatooine  Human  
 7 Beru ~    165    75 brown      light      blue            47   fema~ femin~ Tatooine  Human  
 8 R5-D4      97    32 NA         white, red red             NA   none  mascu~ Tatooine  Droid  
 9 Biggs~    183    84 black      light      brown           24   male  mascu~ Tatooine  Human  
10 Obi-W~    182    77 auburn, w~ fair       blue-gray       57   male  mascu~ Stewjon   Human  
# ... with 77 more rows, and 3 more variables: films <list>, vehicles <list>, starships <list>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM