简体   繁体   English

如何将条目长度≤1的所有列转换为数字?

[英]How to convert all columns where entries have length ≤1 to numeric?

I have a data frame with ~80 columns, and ~20-40 of those columns have single-digit integers that were stored as characters.我有一个包含约 80 列的数据框,其中约 20-40 列具有存储为字符的个位数整数。 Other character columns are complete sentences (so, length >>> 1 ), and so get coerced to NA if I try mutate_if(is.character, as.numeric) .其他字符列是完整的句子(因此, length >>> 1 ),因此如果我尝试mutate_if(is.character, as.numeric)被强制为NA

I would like to transform those efficiently, and based on this question , I was hoping for something like this:我想有效地转换这些,基于这个问题,我希望有这样的事情:

df %>% map_if(is.character & length(.) <= 1, as.numeric)

However, that doesn't work.但是,这不起作用。 I'm hoping for a tidy solution, maybe using purrr ?我希望有一个tidy解决方案,也许使用purrr

The best function for these situations is type_convert() , from readr :这些情况的最佳函数是type_convert() ,来自readr

"[ type_convert() re-converts character columns in a data frame], which is useful if you need to do some manual munging - you can read the columns in as character, clean it up with (eg) regular expressions and other transformations, and then let readr take another stab at parsing it." “[ type_convert()重新转换数据框中的字符列],如果您需要进行一些手动调整,这很有用 - 您可以将列作为字符读取,使用(例如)正则表达式和其他转换进行清理,然后让readr再次尝试解析它。”

So, all you need to do is add it at the end of your pipe:因此,您需要做的就是将它添加到管道的末尾:

df %>% ... %>% type_convert() 

Alternatively, we can use type.convert from base R , which would automatically detect the column type based on the value and change it或者,我们可以使用base R type.convert ,它会根据值自动检测列类型并更改它

df[] <- type.convert(df, as.is = TRUE)

If the constraint is to look for columns that have only one character如果约束是查找只有一个字符的列

i1 <- !colSums(nchar(as.matrix(df)) > 1)
df[i1] <- type.convert(df[i1])

If we want to use tidyverse , there is parse_guess from readr如果我们想用tidyverse ,有parse_guessreadr

library(tidyverse)
library(readr)
df %>%
     mutate_if(all(nchar(.) == 1), parse_guess)

You could check for nchar of the column in mutate_if你可以检查nchar列的mutate_if

library(dplyr)
df %>%  mutate_if(~all(nchar(.) == 1) & is.character(.), as.numeric) 

Using with an example data使用示例数据

df <- data.frame(a = c("ab", "bc", "de", "de", "ef"), 
                 b = as.character(1:5), stringsAsFactors = FALSE)

df1 <- df %>% mutate_if(~all(nchar(.) == 1) & is.character(.), as.numeric) 

str(df1)
#'data.frame':  5 obs. of  2 variables:
# $ a: chr  "ab" "bc" "de" "de" ...
# $ b: num  1 2 3 4 5

You could do the same with map_if as well however, it returns a list back and you need to convert it back to dataframe您也可以对map_if执行相同的map_if ,但是它返回一个列表,您需要将其转换回数据帧

library(purrr)

df %>% 
   map_if(~all(nchar(.) == 1) & is.character(.), as.numeric) %>% 
   as.data.frame(., stringsAsFactors = FALSE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM