简体   繁体   中英

Using stringr to extract a name from a character variable

I have a character variable (Min3$Name) made from file names which include a persons surname, I also have a list, called "Name", which includes all the surnames above plus ones not included, can i use stringr to make a new column with just the surnames from the files? I have tried:

Min3$Name2 <- as.character(str_match_all(Min3$Name , Name))

However the problem is the list has 63 names and the df only includes 25 of them so i get this error:

Error in `$<-.data.frame`(`*tmp*`, Names, value = c("character(0)", 
"character(0)",  : 
 replacement has 63 rows, data has 25

Thanks

EDIT: Here is the df I'm working with

> dput(head(Min3, 1))
structure(list(Min_1 = 136.075840266223, Min_2 = 114.131164725458, 
 Min_3 = 109.639994444444, Min_4 = 103.885620833333, Min_5 = 
97.1868380634391, 
Min_6 = 92.3339222222222, Min_7 = 91.5180047619048, Min_8 = 
90.1389770833333, 
Min_9 = 84.5778222222222, Min_10 = 83.6758497495826, Name = "Sale_A Export 
for Alafoti Fa'osiliva 37599.csv", 
Game = structure(c("Sale_A", "Export", "for", "Alafoti", 
"Fa'osiliva 37599.csv"), .Dim = c(1L, 5L)), Date = structure(17623, class = 
"Date")), .Names = c("Min_1", 
"Min_2", "Min_3", "Min_4", "Min_5", "Min_6", "Min_7", "Min_8", 
"Min_9", "Min_10", "Name", "Game", "Date"), row.names = "Sale_A Export for 
Alafoti Fa'osiliva 37599.csv", class = "data.frame")
> 

The name variable is named after a csv file that has been run through a loop as part of a group of 25 files.

I also have a list of surnames which has 63 names in total:

Name
[1] "Alo"            "Bower"          "Kerrod"         "Milasinovich"   
"Morris"         "Rigby"          "Schonert"       "Waller"        
 [9] "Annett"         "Cutting"        "Singleton"      "Taufete'e"      
"Williams"       "Barry"          "Clegg"          "Kitchener"     
[17] "O'Callaghan"    "Phillips"    "Hill"           
"Kirwan"         "Lewis"          "Fa'osiliva"     "Hill"     

I'm trying to create a new variable, Min3$Name2 which extracts the persons name from the Min3$Name variable.

Hope that's a bit clearer! Thanks

This worked for me, but let me know if it gives you problems.

I wasn't able to reproduce your problem with a single row, so I extended your data. Just a heads-up that in the future, you may want to provide a few rows to deal with list-list interactions, which this looks to be.

# Add another example, sub in a new name
test <- rbind(Min3, Min3)
test$Name[2] <- "Sale_A Export for Alafoti O'Callaghan 37599.csv"

# Running down test$Name, make a new column...
test$newName <- sapply(test$Name, function(x)

      # str_match_all returns a list.  Everything except the matches is empty and gets removed if you unlist it
       unlist(str_match_all(x, Name)))

# Check in the console.  Looks ok to me!
test$newName
[1] "Fa'osiliva"  "O'Callaghan"

You can collapse your vector of names into an "or" regex expression. I only did two names in my example just to show you.

names <- c('Alo', "Fa'osiliva")
names.pattern <- paste0(names, collapse = "|")
names.pattern
#[1] "Alo|Fa'osiliva"

str_extract_all(Min3$Name, pattern = names.pattern)
#[[1]]
#[1] "Fa'osiliva"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM