R - 提取子字符串

Question

I have a column with list of names which includes their Title. 我有一个包含名称列表的列。

Example: 例：

Ryerson, Master. John Borie
Corey, Mrs. Percy C (Mary Phyllis Elizabeth Miller)

I would like to extract their Titles from their names ie, Mr, Mrs, Master etc 我想从他们的名字中提取他们的标题，即先生，夫人，师父等

Function: 功能：

In[79]:
mystring="Wilkes, Master. James (Ellen Needs)"
In[80]:
substr(sub(".*,", "", mystring),2,which(strsplit(sub(".*,", "", mystring),"")[[1]]==".")-1)
Out[80]:
[1] "Master"

When I test the above function on one name, it works fine. 当我在一个名称上测试上述函数时，它工作正常。 But, when I apply the same function to the column with list of names, it is extracting only two characters. 但是，当我将相同的函数应用于具有名称列表的列时，它只提取两个字符。

Example: Ryerson, Master. 例如：瑞尔森，师父。 John Borie 约翰博里

I would like to see 'Master' extracted from this name whereas I see 'Ma'. 我希望看到'Master'从这个名字中提取，而我看到'Ma'。

[436] "Mi" "Mi" "Mr" "Mr" "Mr" "Mr" "Mr" "Mr" "Ms" "Mr" "Ma" "Mi" "Mr" "Mi" "Ma"

I don't know what's wrong with the function. 我不知道这个功能有什么问题。 Appreciate your help! 感谢您的帮助！

Answer 1

Based on the example showed, we can match one or more characters that are not a , ( [^,]+ ) followed by , and one or more space ( \\\\s+ ) from the beginning ( ^ ) of the string or | 根据显示的示例，我们可以匹配一个或多个非a , （ [^,]+ ）后跟的字符,以及一个或多个空格（ \\\\s+ ）从字符串的开头（ ^ ）或| a dot ( \\\\. ) followed by any character till the end of the string ( .* ) and replace it with '' . 一个dot （ \\\\. ）后跟任何字符，直到字符串结尾（ .* ）并用''替换它。

gsub("^[^,]+,\\s+|\\..*$", "", str1)
#[1] "Master" "Mrs"

If it is the second 'word', then word can be used 如果它是第二个'单词'，则可以使用word

library(stringr)
word(str1, 2)
#[1] "Master." "Mrs."

data 数据

str1 <- c("Ryerson, Master. John Borie", 
       "Corey, Mrs. Percy C (Mary Phyllis Elizabeth Miller)")

Answer 2

If you have any spaces in the vector containing names eg:Mr. 如果向量中包含任何包含名称的空格，例如：Mr。 Mahesh, you can try this code Mahesh，你可以试试这段代码

my <- c("MR. Arun", "Master. mahesh")
y <- do.call(rbind,strsplit(my," "))
z <- y[,1]
print(z)
[1] "MR."     "Master."

R - 提取子字符串

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-05-26 07:08:25

data 数据

解决方案2
1 2016-05-26 08:59:13

R - 提取子字符串

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-05-26 07:08:25

data 数据

解决方案2 1 2016-05-26 08:59:13

解决方案1
2 已采纳 2016-05-26 07:08:25

解决方案2
1 2016-05-26 08:59:13