简体   繁体   English

在R中提取名字

[英]Extracting first names in R

Say I have a vector of peoples' names in my dataframe: 假设我的数据框中有一个人名的向量:

names <- c("Bernice Ingram", "Dianna Dean", "Philip Williamson", "Laurie Abbott",
           "Rochelle Price", "Arturo Fisher", "Enrique Newton", "Sarah Mann",
           "Darryl Graham", "Arthur Hoffman")

I want to create a vector with the first names. 我想创建一个带有名字的向量。 All I know about them is that they come first in the vector above and that they're followed by a space. 我所知道的只是他们在上面的向量中首先出现,然后是一个空格。 In other words, this is what I'm looking for: 换句话说,这就是我正在寻找的:

"Bernice" "Dianna"  "Philip" "Laurie" "Rochelle"
"Arturo"  "Enrique" "Sarah"  "Darryl" "Arthur"

I've found a similar question here , but the answers (especially this one ) haven't helped much. 我在这里发现了一个类似的问题,但答案(特别是这一个 )并没有多大帮助。 So far, I've tried a couple of variations of function from the grep family , and the closest I could get to something useful was by running strsplit(names, " ") to separate first names and then strsplit(names, " ")[[1]][1] to get just the first name of the first person. 到目前为止,我已经尝试了grep系列中的几个函数变体,并且最接近我可以获得有用的东西是通过运行strsplit(names, " ")来分隔名字,然后是strsplit(names, " ")[[1]][1]只获得第一个人的名字。 I've been trying to tweak this last command to give me a whole vector of first names, to no avail. 我一直试图调整这最后一个命令给我一个完整的名字矢量,但无济于事。

Use sapply to extract the first name: 使用sapply提取名字:

> sapply(strsplit(names, " "), `[`, 1)
 [1] "Bernice"  "Dianna"   "Philip"   "Laurie"   "Rochelle" "Arturo"   "Enrique" 
 [8] "Sarah"    "Darryl"   "Arthur"

Some comments: 一些评论:

The above works just fine. 以上工作就好了。 To make it a bit more general you could change the split parameter in strsplit function from " " in "\\\\s+" which covers multiple spaces. 为了使它更普遍一点你可以改变split参数strsplit功能从" ""\\\\s+"涵盖多个空格。 Then you also could use gsub to extract directly everything before a space. 然后你也可以使用gsub直接提取空间之前的所有内容。 This last approach will use only one function call and likely to be faster (but I haven't check with benchmark). 最后一种方法只使用一个函数调用,并且可能更快(但我没有检查基准)。

For what you want, here's a pretty unorthodox way to do it: 对于你想要的,这是一个非常非正统的方法:

read.table(text = names, header = FALSE, stringsAsFactors=FALSE, fill = TRUE)[[1]]
# [1] "Bernice"  "Dianna"   "Philip"   "Laurie"   "Rochelle" "Arturo"   "Enrique"  "Sarah"   
# [9] "Darryl"   "Arthur"  

This seems to work: 这似乎有效:

unlist(strsplit(names,' '))[seq(1,2*length(names),2)]

Assuming no first/last names have spaces in them. 假设没有名字/姓氏在其中有空格。

Using regexpr on gsub gsub上使用regexpr

> gsub("^(.*?)\\s.*", "\\1", names)
 [1] "Bernice"  "Dianna"   "Philip"   "Laurie"   "Rochelle" "Arturo"   "Enrique"  "Sarah"   
 [9] "Darryl"   "Arthur"  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM