[英]Unique characters from a column of concatenated strings
我有一个带有字符串列“city”的 data.frame,它由由;
分隔的连接字母组成;
dt = data.frame(id = letters[1:6],
city = c("A;B","B;D","A;D;G","A;C","F;G","C;D"))
dt
# id city
# 1 a A;B
# 2 b B;D
# 3 c A;D;G
# 4 d A;C
# 5 e F;G
# 6 f C;D`
我希望从“城市”列中获得独特的个人字母:
city = c("A","B","C","D","F","G")`
如何做到这一点?
更清洁的解决方案是:
dt= data.frame(id=letters[1:6],city = c("A;B","B;D","A;D;G","A;C","F;G","C;D"))
city=strsplit(as.character(dt$city), ";")
city=sort(unique(unlist(city)))
[1] "A" "B" "C" "D" "F" "G"
数据:
dt= data.frame(id=letters[1:6],city = c("A;B","B;D","A;D;G","A;C","F;G","C;D"))
> dt
id city
1 a A;B
2 b B;D
3 c A;D;G
4 d A;C
5 e F;G
6 f C;D
拆分列city
,使用as.character
转换为字符串:
city <- unlist(strsplit(as.character(dt$city), ";", fixed = T))
> city
[1] "A" "B" "B" "D" "A" "D" "G" "A" "C" "F" "G" "C" "D"
现在使用unique
和order
来获取输出:
city <- unique(city)
> city
[1] "A" "B" "D" "G" "C" "F"
city <- city[order(city)]
> city
[1] "A" "B" "C" "D" "F" "G"
> dput(city)
c("A", "B", "C", "D", "F", "G")
编辑:更新了 OP 的新数据。
Edit2:更新以省略sapply
,因为显然strsplit
是矢量化的。 谢谢@Cris!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.