[英]R - Extract substring
I have a column with list of names which includes their Title. 我有一个包含名称列表的列。
Example: 例:
Ryerson, Master. John Borie
Corey, Mrs. Percy C (Mary Phyllis Elizabeth Miller)
I would like to extract their Titles from their names ie, Mr, Mrs, Master etc 我想从他们的名字中提取他们的标题,即先生,夫人,师父等
Function: 功能:
In[79]:
mystring="Wilkes, Master. James (Ellen Needs)"
In[80]:
substr(sub(".*,", "", mystring),2,which(strsplit(sub(".*,", "", mystring),"")[[1]]==".")-1)
Out[80]:
[1] "Master"
When I test the above function on one name, it works fine. 当我在一个名称上测试上述函数时,它工作正常。 But, when I apply the same function to the column with list of names, it is extracting only two characters. 但是,当我将相同的函数应用于具有名称列表的列时,它只提取两个字符。
Example: Ryerson, Master. 例如:瑞尔森,师父。 John Borie 约翰博里
I would like to see 'Master' extracted from this name whereas I see 'Ma'. 我希望看到'Master'从这个名字中提取,而我看到'Ma'。
[436] "Mi" "Mi" "Mr" "Mr" "Mr" "Mr" "Mr" "Mr" "Ms" "Mr" "Ma" "Mi" "Mr" "Mi" "Ma"
I don't know what's wrong with the function. 我不知道这个功能有什么问题。 Appreciate your help! 感谢您的帮助!
Based on the example showed, we can match one or more characters that are not a ,
( [^,]+
) followed by ,
and one or more space ( \\\\s+
) from the beginning ( ^
) of the string or |
根据显示的示例,我们可以匹配一个或多个非a ,
( [^,]+
)后跟的字符,
以及一个或多个空格( \\\\s+
)从字符串的开头( ^
)或|
a dot
( \\\\.
) followed by any character till the end of the string ( .*
) and replace it with ''
. 一个dot
( \\\\.
)后跟任何字符,直到字符串结尾( .*
)并用''
替换它。
gsub("^[^,]+,\\s+|\\..*$", "", str1)
#[1] "Master" "Mrs"
If it is the second 'word', then word
can be used 如果它是第二个'单词',则可以使用word
library(stringr)
word(str1, 2)
#[1] "Master." "Mrs."
str1 <- c("Ryerson, Master. John Borie",
"Corey, Mrs. Percy C (Mary Phyllis Elizabeth Miller)")
If you have any spaces in the vector containing names eg:Mr. 如果向量中包含任何包含名称的空格,例如:Mr。 Mahesh, you can try this code Mahesh,你可以试试这段代码
my <- c("MR. Arun", "Master. mahesh")
y <- do.call(rbind,strsplit(my," "))
z <- y[,1]
print(z)
[1] "MR." "Master."
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.