[英]Split vector of string elements into list
I have a vector: 我有一个向量:
my_vec <-
c("Iceland", "06/2010,60% ,38% ,1% ,1% ,0% ", "11/2010,63% ,36% ,1% ,0% ,0% ",
"05/2011,59% ,38% ,2% ,1% ,0% ", "11/2011,56% ,40% ,3% ,0% ,1% ",
"05/2012,60% ,36% ,2% ,2% ,0% ", "11/2012,59% ,40% ,1% ,0% ,0% ",
"05/2013,60% ,38% ,1% ,0% ,1% ", "11/2013,55% ,43% ,2% ,0% ,0% ",
"06/2014,59% ,39% ,2% ,0% ,0% ", "Montenegro", "05/2011,11% ,41% ,36% ,11% ,1% ",
"11/2011,12% ,43% ,32% ,12% ,1% ", "05/2012,8% ,35% ,38% ,14% ,5% ",
"11/2012,9% ,35% ,34% ,18% ,4% ", "05/2013,10% ,39% ,32% ,16% ,3% ",
"11/2013,10% ,34% ,36% ,19% ,1% ", "06/2014,15% ,47% ,27% ,11% ,0% ",
"Republic of Serbia ", "05/2012,3% ,31% ,43% ,20% ,3% ", "11/2012,5% ,28% ,43% ,21% ,3% ",
"05/2013,6% ,29% ,44% ,18% ,3% ", "11/2013,7% ,34% ,39% ,18% ,2% ",
"06/2014,11% ,40% ,33% ,16% ")
The vector contains both country name and some values that are comma delimited. 向量包含国家名称和一些以逗号分隔的值。 I would like to split the vector to list by country name. 我想将向量拆分为按国家/地区名称列出。
I tried: 我试过了:
library(stringr)
split(my_vec, which(str_detect(my_vec, "[aeiou]")))
but the output is not correct: 但输出不正确:
$`1`
[1] "Iceland" "05/2011,59% ,38% ,2% ,1% ,0% "
[3] "11/2012,59% ,40% ,1% ,0% ,0% " "06/2014,59% ,39% ,2% ,0% ,0% "
[5] "11/2011,12% ,43% ,32% ,12% ,1% " "05/2013,10% ,39% ,32% ,16% ,3% "
[7] "Republic of Serbia " "05/2013,6% ,29% ,44% ,18% ,3% "
$`11`
[1] "06/2010,60% ,38% ,1% ,1% ,0% " "11/2011,56% ,40% ,3% ,0% ,1% "
[3] "05/2013,60% ,38% ,1% ,0% ,1% " "Montenegro"
[5] "05/2012,8% ,35% ,38% ,14% ,5% " "11/2013,10% ,34% ,36% ,19% ,1% "
[7] "05/2012,3% ,31% ,43% ,20% ,3% " "11/2013,7% ,34% ,39% ,18% ,2% "
$`19`
[1] "11/2010,63% ,36% ,1% ,0% ,0% " "05/2012,60% ,36% ,2% ,2% ,0% "
[3] "11/2013,55% ,43% ,2% ,0% ,0% " "05/2011,11% ,41% ,36% ,11% ,1% "
[5] "11/2012,9% ,35% ,34% ,18% ,4% " "06/2014,15% ,47% ,27% ,11% ,0% "
[7] "11/2012,5% ,28% ,43% ,21% ,3% " "06/2014,11% ,40% ,33% ,16% "
Every list element should be the country name. 每个列表元素都应该是国家/地区名称。
This isn't what you asked for, but might be more the direction you're heading. 这不是您要的,但可能更是您要去的方向。 It's too long for a comment, so I thought I would post as an answer. 发表评论的时间太长了,所以我想我将发布它作为答案。
I've written a function called read.mtable
that is a wrapper for a for
loop that lets you read data into a list
of data.frame
s (which is what it seems like you have here). 我已经编写了一个名为read.mtable
的函数 , 该函数是for
循环的包装器,它使您可以将数据读入data.frame
的list
中(这看起来像这里的样子)。 It's part of my "SOfun" package on GitHub, so you can install it using: 它是我在GitHub上的“ SOfun”软件包的一部分,因此您可以使用以下命令进行安装:
library(devtools)
install_github("SOfun", "mrdwab") ## for `read.mtable`
With your sample vector, I would use it like this: 对于您的样本向量,我将像这样使用它:
read.mtable(textConnection(my_vec),
chunkId = "^[[:alpha:]]",
header = FALSE, fill = TRUE,
sep = ",", strip.white = TRUE)
# $Iceland
# V1 V2 V3 V4 V5 V6
# 1 06/2010 60% 38% 1% 1% 0%
# 2 11/2010 63% 36% 1% 0% 0%
# 3 05/2011 59% 38% 2% 1% 0%
# 4 11/2011 56% 40% 3% 0% 1%
# 5 05/2012 60% 36% 2% 2% 0%
# 6 11/2012 59% 40% 1% 0% 0%
# 7 05/2013 60% 38% 1% 0% 1%
# 8 11/2013 55% 43% 2% 0% 0%
# 9 06/2014 59% 39% 2% 0% 0%
#
# $Montenegro
# V1 V2 V3 V4 V5 V6
# 1 05/2011 11% 41% 36% 11% 1%
# 2 11/2011 12% 43% 32% 12% 1%
# 3 05/2012 8% 35% 38% 14% 5%
# 4 11/2012 9% 35% 34% 18% 4%
# 5 05/2013 10% 39% 32% 16% 3%
# 6 11/2013 10% 34% 36% 19% 1%
# 7 06/2014 15% 47% 27% 11% 0%
#
# $`Republic of Serbia `
# V1 V2 V3 V4 V5 V6
# 1 05/2012 3% 31% 43% 20% 3%
# 2 11/2012 5% 28% 43% 21% 3%
# 3 05/2013 6% 29% 44% 18% 3%
# 4 11/2013 7% 34% 39% 18% 2%
# 5 06/2014 11% 40% 33% 16%
Updated: 更新:
u <- unlist(gregexpr("^[[:alpha:]]", my_vec))
w <- which(u==1)
s <- setNames(split(my_vec[-w], cumsum(u + 1)[-w]), my_vec[w])
Result: 结果:
> s
$Iceland
[1] "06/2010,60% ,38% ,1% ,1% ,0% " "11/2010,63% ,36% ,1% ,0% ,0% "
[3] "05/2011,59% ,38% ,2% ,1% ,0% " "11/2011,56% ,40% ,3% ,0% ,1% "
[5] "05/2012,60% ,36% ,2% ,2% ,0% " "11/2012,59% ,40% ,1% ,0% ,0% "
[7] "05/2013,60% ,38% ,1% ,0% ,1% " "11/2013,55% ,43% ,2% ,0% ,0% "
[9] "06/2014,59% ,39% ,2% ,0% ,0% "
$Montenegro
[1] "05/2011,11% ,41% ,36% ,11% ,1% " "11/2011,12% ,43% ,32% ,12% ,1% "
[3] "05/2012,8% ,35% ,38% ,14% ,5% " "11/2012,9% ,35% ,34% ,18% ,4% "
[5] "05/2013,10% ,39% ,32% ,16% ,3% " "11/2013,10% ,34% ,36% ,19% ,1% "
[7] "06/2014,15% ,47% ,27% ,11% ,0% "
$`Republic of Serbia `
[1] "05/2012,3% ,31% ,43% ,20% ,3% " "11/2012,5% ,28% ,43% ,21% ,3% "
[3] "05/2013,6% ,29% ,44% ,18% ,3% " "11/2013,7% ,34% ,39% ,18% ,2% "
[5] "06/2014,11% ,40% ,33% ,16% "
You can also easily convert this into a list of data.frames (per @AnandaMahto's answer): 您还可以轻松地将其转换为data.frames列表(根据@AnandaMahto的回答):
> lapply(s, function(x) read.csv(text=x))
$Iceland
X06.2010 X60. X38. X1. X1..1 X0.
1 11/2010 63% 36% 1% 0% 0%
2 05/2011 59% 38% 2% 1% 0%
3 11/2011 56% 40% 3% 0% 1%
4 05/2012 60% 36% 2% 2% 0%
5 11/2012 59% 40% 1% 0% 0%
6 05/2013 60% 38% 1% 0% 1%
7 11/2013 55% 43% 2% 0% 0%
8 06/2014 59% 39% 2% 0% 0%
$Montenegro
X05.2011 X11. X41. X36. X11..1 X1.
1 11/2011 12% 43% 32% 12% 1%
2 05/2012 8% 35% 38% 14% 5%
3 11/2012 9% 35% 34% 18% 4%
4 05/2013 10% 39% 32% 16% 3%
5 11/2013 10% 34% 36% 19% 1%
6 06/2014 15% 47% 27% 11% 0%
$`Republic of Serbia `
X05.2012 X3. X31. X43. X20. X3..1
1 11/2012 5% 28% 43% 21% 3%
2 05/2013 6% 29% 44% 18% 3%
3 11/2013 7% 34% 39% 18% 2%
4 06/2014 11% 40% 33% 16%
You could try: 您可以尝试:
indx <- grep("^[A-Za-z ]", my_vec) #create the index of country names from the list
indx2<- diff(c(indx, length(my_vec)+1))-1 #create another index to replicate the country names
split the vector without country names with country names that are replicated 使用复制的国家/地区名称分割不包含国家/地区名称的向量
split(my_vec[-indx], rep(my_vec[indx], indx2))
#$Iceland
#[1] "06/2010,60% ,38% ,1% ,1% ,0% " "11/2010,63% ,36% ,1% ,0% ,0% "
#[3] "05/2011,59% ,38% ,2% ,1% ,0% " "11/2011,56% ,40% ,3% ,0% ,1% "
#[5] "05/2012,60% ,36% ,2% ,2% ,0% " "11/2012,59% ,40% ,1% ,0% ,0% "
#[7] "05/2013,60% ,38% ,1% ,0% ,1% " "11/2013,55% ,43% ,2% ,0% ,0% "
#[9] "06/2014,59% ,39% ,2% ,0% ,0% "
#$Montenegro
#[1] "05/2011,11% ,41% ,36% ,11% ,1% " "11/2011,12% ,43% ,32% ,12% ,1% "
#[3] "05/2012,8% ,35% ,38% ,14% ,5% " "11/2012,9% ,35% ,34% ,18% ,4% "
#[5] "05/2013,10% ,39% ,32% ,16% ,3% " "11/2013,10% ,34% ,36% ,19% ,1% "
#[7] "06/2014,15% ,47% ,27% ,11% ,0% "
#$`Republic of Serbia `
#[1] "05/2012,3% ,31% ,43% ,20% ,3% " "11/2012,5% ,28% ,43% ,21% ,3% "
#[3] "05/2013,6% ,29% ,44% ,18% ,3% " "11/2013,7% ,34% ,39% ,18% ,2% "
#[5] "06/2014,11% ,40% ,33% ,16% "
To convert it to a list of data.frames 将其转换为data.frames列表
lapply(split(my_vec[-indx], rep(my_vec[indx], indx2)),
function(x) read.table(text=x, sep=",", header=F, stringsAsFactors=F, fill=T))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.