简体   繁体   中英

create and power a column from another column in R

Objective is to create two new columns from the Code column of the data below. One with the numbers and another with the codes (factor). How do I do that? I tried ifelse() , but it gave erro.

structure(list(Potreiro = structure(c(3L, 3L, 3L, 3L, 
3L, 4L, 3L, 4L, 3L, 4L), .Label = c("1A", "6B", "7A", "7B"), class = 
"factor"), Code = structure(c(4L, 1L, 8L, 3L, 2L, 4L, 6L, 5L, 8L, 7L ), 
.Label = c("2", "3", "4", "5", "50%", "70%", "ac", "ad", "av", "cd", "de", 
"Dem"), class = "factor")), .Names = c("Potreiro", "Code"), row.names = 
c(NA, 10L), class = "data.frame")

thanks!!!

library(dplyr)
library(stringr)

df <- structure(list(Potreiro = structure(c(3L, 3L, 3L, 3L, 3L, 4L, 3L, 4L, 3L, 4L), 
.Label = c("1A", "6B", "7A", "7B"), 
class = "factor"), 
Code = structure(c(4L, 1L, 8L, 3L, 2L, 4L, 6L, 5L, 8L, 7L ), 
.Label = c("2", "3", "4", "5", "50%", "70%", "ac", "ad", "av", "cd", "de", "Dem"), 
class = "factor")), 
.Names = c("Potreiro", "Code"), 
row.names = c(NA, 10L), 
class = "data.frame")



df %>% 
  mutate(
  number = str_extract_all(Code, "\\d+"),
  word = str_extract(Code, "\\D[^%]")
  )

The number variable regex is looking for digits and will match at least once \\\\d+ . The word variable regex is looking for not a number while stripping off the % signs.

The result:

   Potreiro Code number word
1        7A    5      5 <NA>
2        7A    2      2 <NA>
3        7A   ad          ad
4        7A    4      4 <NA>
5        7A    3      3 <NA>
6        7B    5      5 <NA>
7        7A  70%     70 <NA>
8        7B  50%     50 <NA>
9        7A   ad          ad
10       7B   ac          ac

I would do this:

df <- 
         structure(list(Potreiro = structure(c(3L, 3L, 3L, 3L, 
                                      3L, 4L, 3L, 4L, 3L, 4L), .Label = c("1A", "6B", "7A", "7B"), class = 
                                      "factor"), Code = structure(c(4L, 1L, 8L, 3L, 2L, 4L, 6L, 5L, 8L, 7L ), 
                                                                  .Label = c("2", "3", "4", "5", "50%", "70%", "ac", "ad", "av", "cd", "de", 
                                                                             "Dem"), class = "factor")), .Names = c("Potreiro", "Code"), row.names = 
            c(NA, 10L), class = "data.frame")

arenum <- sapply(df$Code, function (x) grepl('[[:digit:]]', x))
df$codenum <- ifelse(arenum, as.character(df$Code), NaN)
df$codechar <- ifelse(!arenum, as.character(df$Code), NaN)
df

In case you don't really want anything else than numbers change arnum:

arenum <- sapply(df$Code, function (x) gsub('[[:digit:]]', '', x) == '')

Here is an option using extract

library(dplyr)
library(tidyr)
df %>% 
  extract(Code, into = c('number', 'word'), '(\\d*)([a-z]*)', remove = FALSE, convert = TRUE)
#  Potreiro Code number word
#1        7A    5      5     
#2        7A    2      2     
#3        7A   ad     NA   ad
#4        7A    4      4     
#5        7A    3      3     
#6        7B    5      5     
#7        7A  70%     70     
#8        7B  50%     50     
#9        7A   ad     NA   ad
#10       7B   ac     NA   ac

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM