I wanted to add an additional column to an existing dataframe where the value of newColumn would be based on a capture group of a regex applied to another value in the same row and the only thing I came up with that worked so far was this (probably not R-esque) standard-approach of looping but it is awefully slow (for a DF of around 1.5 million rows).
Dataframe with Columns:
ID Text NewColumn
Atm I work with this:
df$newColumn <- rep("", nrow(df));
for (row in 1:nrow(df)) {
df$newColumn[row] <- str_match(df$Text[row], regex)[1,2];
}
I tried using apply/lapply after reading several posts but none of my approaches created the expected result. Is this even possible with a function of the apply-family, and if yes: how?
Example:
for
regex <- "^[0-9]*([a-zA-Z]*)$";
and a table like the following:
ID Text
------------------
1 231Ben
2 112Claudine
3 538Julia
I would expect:
ID Text NewColumn
----------------------------
1 231Ben Ben
2 112Claudine Claudine
3 538Julia Julia
The str_match
and gsub/sub
etc are vectorized, so we don't have to loop through the rows if the pattern
is the same
df1$NewColumn <- gsub("\\d+", "", df1$Text)
Or with stringr
functions
library(stringr)
df1$NewColumn <- str_match(df1$Text, "([A-Za-z]+)")[,1]
str_extract(df1$Text, "[A-Za-z]+")
#[1] "Ben" "Claudine" "Julia"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.