简体   繁体   中英

How to manipulate digits in a character string in R?

I feel like I have a super easy question but for the life of me I can't find it when googling or searching here (or I don't know the correct terms to find a solution) so here goes.

I have a large amount of text in R in which I want to identify all numbers/digits, and add a specific number to them, for example 5.

So just as a small example, if this were my text:

text <- c("Hi. It is 6am. I want to leave at 7am")

I want the output to be:

> text
[1] "Hi. It is 11am.  I want to leave at 12am"

But also I need the addition for each individual digit, so if this is the text:

text <- c("Hi. It is 2017. I am 35 years old.")

...I want the output to be:

> text
[1] "Hi. It is 75612. I am 810 years old."

I have tried 'grabbing' the numbers from the string and adding 5, but I don't know how to then get them back into the original string so I can get the full text back.

How should I go about this? Thanks in advance!

Here is how I would do the time. I would search for a number that is followed by am or pm and then sub in a math expression to be evaluated by gsubfn . This is pretty flexible, but would require whole hours in its current implementation. I added an am and pm if you wanted to swap those, but I didn't try to code in detecting if the number changes from am to pm. Also note that I didn't code in rolling from 12 to 1. If you add numbers over 12, you will get a number bigger than 12.

text1 <- c("Hi. It is 6am. I want to leave at 7am")
text2 <- c("It is 9am. I want to leave at 10am, but the cab comes at 11am. Can I push my flight to 12am?")

change_time <- function(text, hours, sign, am_pm){
  string_change <- glue::glue("`(\\1{sign}{hours})`{am_pm}")
  
  gsub("(\\d+)(?=am|pm)(am|pm)", string_change, text, perl = TRUE)|>
  gsubfn::fn$c()
}

change_time(text = text1, hours = 5, sign = "+", am_pm = "am")
#> [1] "Hi. It is 11am. I want to leave at 12am"

change_time(text = text2, hours = 3, sign = "-", am_pm = "pm")
#> [1] "It is 6pm. I want to leave at 7pm, but the cab comes at 8pm. Can I push my flight to 9pm?"
text1 <- c("Hi. It is 2017. I am 35 years old.")
text2 <- c("Hi. It is 6am. I want to leave at 7am")

change_number <- function(text, change, sign){   
  string_change <- glue::glue("`(\\1{sign}{change})`")
  gsub("(\\d)", string_change, text, perl = TRUE) %>%
    gsubfn::fn$c() }

change_number(text = text1, change = 5, sign = "+")
#>[1] "Hi. It is 75612. I am 810 years old."

change_number(text = text2, change = 5, sign = "+")
#>[1] "Hi. It is 11am. I want to leave at 12am"

This works perfectly. Many thanks to @AndS., I tweaked (or rather, simplified) your code to fit my needs better. I was determined to figure out the other text myself haha, so thanks for showing me how!

Something quick and dirty with base R:

add_n = \(x, n, by_digit = FALSE) {
  if (by_digit) ptrn = "[0-9]" else ptrn = "[0-9]+"
  tmp       = gregexpr(ptrn, x)
  raw       = regmatches(x, gregexpr(ptrn, x))
  raw_plusn = lapply(raw, \(x) as.integer(x) + n)
  for (i in seq_along(x)) regmatches(x[i], tmp[i]) = raw_plusn[i]
  x
}

text = c(
  "Hi. It is 6am. I want to leave at 7am", 
  "wow it's 505 dollars and 19 cents",
  "Hi. It is 2017. I am 35 years old."
)

> add_n(text, 5)
# [1] "Hi. It is 11am. I want to leave at 12am"
# [2] "wow it's 510 dollars and 24 cents"      
# [3] "Hi. It is 2022. I am 40 years old."     

> add_n(text, -2)
# [1] "Hi. It is 4am. I want to leave at 5am" "wow it's 503 dollars and 17 cents"    
# [3] "Hi. It is 2015. I am 33 years old."   

> add_n(text, 5, by_digit = TRUE)
# [1] "Hi. It is 11am. I want to leave at 12am"
# [2] "wow it's 10510 dollars and 614 cents"   
# [3] "Hi. It is 75612. I am 810 years old."  

Here's a tidyverse solution:

data.frame(text) %>%
  # separate `text` into individual characters:
  separate_rows(text,  sep = "(?<!^)(?!$)") %>% 
  # add `5` to any digit:
  mutate(
           # if you detect a digit...
    text = ifelse(str_detect(text, "\\d"),
           # ... extract it, convert it to numeric, add `5`:
           as.numeric(str_extract(text, "\\d")) + 5,
           # ... else leave `text` as is:
           text)
    ) %>% 
  # string the characters back together:
  summarise(text = str_c(text, collapse = ""))
# A tibble: 1 × 1
  text                                   
  <chr>                                  
1 Hi. It is 11am. I want to leave at 12am

Data 1:

text <- c("Hi. It is 6am. I want to leave at 7am")

Note that the same code works for the second text as well without any change:

# A tibble: 1 × 1
  text                                
  <chr>                               
1 Hi. It is 75612. I am 810 years old.

Data 2:

text <- c("Hi. It is 2017. I am 35 years old.")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM