在R中提取名字

Question

Say I have a vector of peoples' names in my dataframe: 假设我的数据框中有一个人名的向量：

names <- c("Bernice Ingram", "Dianna Dean", "Philip Williamson", "Laurie Abbott",
           "Rochelle Price", "Arturo Fisher", "Enrique Newton", "Sarah Mann",
           "Darryl Graham", "Arthur Hoffman")

I want to create a vector with the first names. 我想创建一个带有名字的向量。 All I know about them is that they come first in the vector above and that they're followed by a space. 我所知道的只是他们在上面的向量中首先出现，然后是一个空格。 In other words, this is what I'm looking for: 换句话说，这就是我正在寻找的：

"Bernice" "Dianna"  "Philip" "Laurie" "Rochelle"
"Arturo"  "Enrique" "Sarah"  "Darryl" "Arthur"

I've found a similar question here , but the answers (especially this one ) haven't helped much. 我在这里发现了一个类似的问题，但答案（特别是这一个）并没有多大帮助。 So far, I've tried a couple of variations of function from the grep family , and the closest I could get to something useful was by running strsplit(names, " ") to separate first names and then strsplit(names, " ")[[1]][1] to get just the first name of the first person. 到目前为止，我已经尝试了grep系列中的几个函数变体，并且最接近我可以获得有用的东西是通过运行strsplit(names, " ")来分隔名字，然后是strsplit(names, " ")[[1]][1]只获得第一个人的名字。 I've been trying to tweak this last command to give me a whole vector of first names, to no avail. 我一直试图调整这最后一个命令给我一个完整的名字矢量，但无济于事。

Answer 1

Use sapply to extract the first name: 使用sapply提取名字：

> sapply(strsplit(names, " "), `[`, 1)
 [1] "Bernice"  "Dianna"   "Philip"   "Laurie"   "Rochelle" "Arturo"   "Enrique" 
 [8] "Sarah"    "Darryl"   "Arthur"

Some comments: 一些评论：

The above works just fine. 以上工作就好了。 To make it a bit more general you could change the split parameter in strsplit function from " " in "\\\\s+" which covers multiple spaces. 为了使它更普遍一点你可以改变split参数strsplit功能从" "在"\\\\s+"涵盖多个空格。 Then you also could use gsub to extract directly everything before a space. 然后你也可以使用gsub直接提取空间之前的所有内容。 This last approach will use only one function call and likely to be faster (but I haven't check with benchmark). 最后一种方法只使用一个函数调用，并且可能更快（但我没有检查基准）。

Answer 2

For what you want, here's a pretty unorthodox way to do it: 对于你想要的，这是一个非常非正统的方法：

read.table(text = names, header = FALSE, stringsAsFactors=FALSE, fill = TRUE)[[1]]
# [1] "Bernice"  "Dianna"   "Philip"   "Laurie"   "Rochelle" "Arturo"   "Enrique"  "Sarah"   
# [9] "Darryl"   "Arthur"

Answer 3

This seems to work: 这似乎有效：

unlist(strsplit(names,' '))[seq(1,2*length(names),2)]

Assuming no first/last names have spaces in them. 假设没有名字/姓氏在其中有空格。

Answer 4

Using regexpr on gsub 在gsub上使用regexpr

> gsub("^(.*?)\\s.*", "\\1", names)
 [1] "Bernice"  "Dianna"   "Philip"   "Laurie"   "Rochelle" "Arturo"   "Enrique"  "Sarah"   
 [9] "Darryl"   "Arthur"

在R中提取名字

问题描述

4 个解决方案

解决方案1
10 已采纳 2013-10-11 15:22:09

解决方案2
5 2013-10-11 16:53:58

解决方案3
3 2013-10-11 15:25:43

解决方案4
3 2013-10-11 15:26:48

在R中提取名字

问题描述

4 个解决方案

解决方案1 10 已采纳 2013-10-11 15:22:09

解决方案2 5 2013-10-11 16:53:58

解决方案3 3 2013-10-11 15:25:43

解决方案4 3 2013-10-11 15:26:48

解决方案1
10 已采纳 2013-10-11 15:22:09

解决方案2
5 2013-10-11 16:53:58

解决方案3
3 2013-10-11 15:25:43

解决方案4
3 2013-10-11 15:26:48