[英]R - remove anything after comma from column
I'd like to strip this column so that it just shows last name - if there is a comma I'd like to remove the comma and anything after it. 我想删除此列,以便它只显示姓氏 - 如果有逗号我想删除逗号及其后的任何内容。 I have data column that is a mix of just last names and last, first. 我有数据列,它只是姓氏和最后一个,第一个。 The data looks as follows: 数据如下:
Last Name
Sample, A
Tester
Wilfred, Nancy
Day, Bobby Jean
Morris
You could use gsub() and some regex: 你可以使用gsub()和一些正则表达式:
> x <- 'Day, Bobby Jean'
> gsub("(.*),.*", "\\1", x)
[1] "Day"
You can use gsub: 你可以使用gsub:
gsub(",.*", "", c("last only", "last, first"))
# [1] "last only" "last"
",.*"
says: replace comma (,) and every character after that (.*), with nothing ""
. ",.*"
说:替换逗号(,)和之后的每个字符(。*),没有任何""
。
str1 <- c("Sample, A", "Tester", "Wifred, Nancy", "Day, Bobby Jean", "Morris")
library(stringr)
str_extract(str1, perl('[A-Za-z]+(?=(,|\\b))'))
#[1] "Sample" "Tester" "Wifred" "Day" "Morris"
Match alphabets [A-Za-z]+
and extract those which are followed by ,
or word boundary. 匹配字母[A-Za-z]+
并提取的那些,随后,
或字边界。
This is will work 这是行得通的
a <- read.delim("C:\\Desktop\\a.csv", row.names = NULL,header=TRUE,
stringsAsFactors=FALSE,sep=",")
a=as.matrix(a)
Data=str_replace_all(string=a,pattern="\\,.*$",replacement=" ")
Also try strsplit
: 也尝试strsplit
:
string <- c("Sample, A", "Tester", "Wifred, Nancy", "Day, Bobby Jean", "Morris")
sapply(strsplit(string, ","), "[", 1)
#[1] "Sample" "Tester" "Wifred" "Day" "Morris"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.