简体   繁体   English

R - 提取子字符串

[英]R - Extract substring

I have a column with list of names which includes their Title. 我有一个包含名称列表的列。

Example: 例:

Ryerson, Master. John Borie
Corey, Mrs. Percy C (Mary Phyllis Elizabeth Miller)

I would like to extract their Titles from their names ie, Mr, Mrs, Master etc 我想从他们的名字中提取他们的标题,即先生,夫人,师父等

Function: 功能:

In[79]:
mystring="Wilkes, Master. James (Ellen Needs)"
In[80]:
substr(sub(".*,", "", mystring),2,which(strsplit(sub(".*,", "", mystring),"")[[1]]==".")-1)
Out[80]:
[1] "Master"

When I test the above function on one name, it works fine. 当我在一个名称上测试上述函数时,它工作正常。 But, when I apply the same function to the column with list of names, it is extracting only two characters. 但是,当我将相同的函数应用于具有名称列表的列时,它只提取两个字符。

Example: Ryerson, Master. 例如:瑞尔森,师父。 John Borie 约翰博里

I would like to see 'Master' extracted from this name whereas I see 'Ma'. 我希望看到'Master'从这个名字中提取,而我看到'Ma'。

[436] "Mi" "Mi" "Mr" "Mr" "Mr" "Mr" "Mr" "Mr" "Ms" "Mr" "Ma" "Mi" "Mr" "Mi" "Ma"

I don't know what's wrong with the function. 我不知道这个功能有什么问题。 Appreciate your help! 感谢您的帮助!

Based on the example showed, we can match one or more characters that are not a , ( [^,]+ ) followed by , and one or more space ( \\\\s+ ) from the beginning ( ^ ) of the string or | 根据显示的示例,我们可以匹配一个或多个非a ,[^,]+ )后跟的字符,以及一个或多个空格( \\\\s+ )从字符串的开头( ^ )或| a dot ( \\\\. ) followed by any character till the end of the string ( .* ) and replace it with '' . 一个dot\\\\. )后跟任何字符,直到字符串结尾( .* )并用''替换它。

gsub("^[^,]+,\\s+|\\..*$", "", str1)
#[1] "Master" "Mrs"  

If it is the second 'word', then word can be used 如果它是第二个'单词',则可以使用word

library(stringr)
word(str1, 2)
#[1] "Master." "Mrs."   

data 数据

str1 <- c("Ryerson, Master. John Borie", 
       "Corey, Mrs. Percy C (Mary Phyllis Elizabeth Miller)")

If you have any spaces in the vector containing names eg:Mr. 如果向量中包含任何包含名称的空格,例如:Mr。 Mahesh, you can try this code Mahesh,你可以试试这段代码

my <- c("MR. Arun", "Master. mahesh")
y <- do.call(rbind,strsplit(my," "))
z <- y[,1]
print(z)
[1] "MR."     "Master."

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM