简体   繁体   中英

Finding out if the 4th digit in a string is a number or character in r

Following on from the question as found by the link below.

How to test if the first three characters in a string are letters or numbers in r?

How do I include it to check that the 4th character is numeric also? For instance, an example of my dataframe is as follows.

ID   X
1   MJF34
2   GA249D
3   DEW235R
4   4SDFR3
5   DAS3
6   BHFS7

So again, I want the first three characters in the string to be letters and I also want the 4th to be any number between 0-9. If the given rule is achieved then I want it to paste the first three letters of the X variable in a new column. If not I want it to say "FR". Hence the final dataset it as follows.

ID    X       Y
1    MJF34   MJF 
2    GA249D  FR
3    DEW235R DEW
4    4SDFR3  FR
5    DAS3    DAS
6    BHFS7   FR

What I have so far that checks the first three letters is:

sub_string<-substr(df$X, 1, 3)

df$Y<-ifelse(grepl('[0-9]',sub_string), "FR", sub_string)

I have tried to expand it to read the 4th but doesn't seem to work.

sub_number<-substr(df$X, 4, 4)
df$Y<-ifelse(grepl('[0-9]',sub_string) && !grepl('[0-9]',sub_number), "FR", sub_string)

I'm probably doing something obviously wrong but can't seem to figure it out? Thanks in advance

I would use a logical index like this:

idx <- grepl("^[A-Z]{3}\\d", df$X) # you can use ignore.case=TRUE too
df$Y <- "FR"
df[idx, "Y"] <- substr(df[idx, "X"], 1, 3)

#  ID       X   Y
#1  1   MJF34 MJF
#2  2  GA249D  FR
#3  3 DEW235R DEW
#4  4  4SDFR3  FR
#5  5    DAS3 DAS
#6  6   BHFS7  FR

Based on the code you posted you can use this:

x = c("MJF34", "GA249D", "DEW235R")

ifelse(grepl('[0-9]',substr(x, 1, 3)) | !grepl('[0-9]',substr(x, 4, 4)), "FR", substr(x, 1, 3))

# [1] "MJF" "FR"  "DEW"

You can store this as a function if you want to use it again in your code:

vec = c("MJF34", "GA249D", "DEW235R")

UpdateVector = function(x) ifelse(grepl('[0-9]',substr(x, 1, 3)) | !grepl('[0-9]',substr(x, 4, 4)), "FR", substr(x, 1, 3))

UpdateVector(vec)

# [1] "MJF" "FR"  "DEW"

The stringr package may be useful in your case:

library(dplyr)
library(stringr)    

df %>%
  mutate(Y = if_else(str_detect(X, "^[A-Z]{3}[0-9]"), 
                     str_sub(X, start = 1, end = 3), 
                     "FR"))

Output:

# A tibble: 6 x 3
 ID       X     Y
  <int>   <chr> <chr>
1     1   MJF34   MJF
2     2  GA249D    FR
3     3 DEW235R   DEW
4     4  4SDFR3    FR
5     5    DAS3   DAS
6     6   BHFS7    FR

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM