简体   繁体   English

从结构化字符向量中抓取两个变量并创建数据框

[英]Grab two variables from structured character vector and create data frame

Let's have a following vector: 让我们有以下向量:

vector <- c("0:00 0,6 0:00", "5:00 1,2 5:00","9:30 0,9 22:00","16:00 1,0","21:30 0,9")

We see that element contains: 我们看到该元素包含:

hours,number (for instance "0,6"), hour2 (or blank) 小时,数字(例如“ 0,6”),小时2(或空白)

It seems structured: after ":" are always two digits ("00" or "30") then "" and number with decimal point (comma). 似乎是结构化的:“:”之后始终是两位数字(“ 00”或“ 30”),然后是“”和带小数点的数字(逗号)。

I want to create data frame and get data frame containing first hour and given number, like: 我想创建数据框并获取包含第一个小时和给定数字的数据框,例如:

#Expected result:
df
$hours $value
#0:00   0.6
#5:00   1.2
#9:30   0.9
#16:00  1.0
#21:30  0.9

You can try: 你可以试试:

data.frame(hours = sapply(strsplit(vector, " "), function(x) x[1]),
value = sapply(strsplit(vector, " "), function(x) x[2]))

  hours value
1  0:00   0,6
2  5:00   1,2
3  9:30   0,9
4 16:00   1,0
5 21:30   0,9

It , first, splits the vector by strsplit() , then combines the first and second element in a data.frame . 它首先通过strsplit()分割向量,然后将第一个和第二个元素data.framedata.frame

If you also want to replace the comma with a decimal: 如果您还想用小数点替换逗号:

data.frame(hours = sapply(strsplit(vector, " "), function(x) x[1]),
value = sub(",", ".", sapply(strsplit(vector, " "), function(x) x[2])))

  hours value
1  0:00   0.6
2  5:00   1.2
3  9:30   0.9
4 16:00   1.0
5 21:30   0.9

It does the same as the code above, but it is also replacing comma in the second element by decimal using sub() . 它的作用与上面的代码相同,但是它也使用sub()将第二个元素中的逗号替换为十进制。

Or: 要么:

df <- read.table(text = vector, sep = " ", dec = ",", as.is = TRUE, fill = TRUE)[, 1:2]
colnames(df) <- c("hours", "value")

  hours value
1  0:00   0.6
2  5:00   1.2
3  9:30   0.9
4 16:00   1.0
5 21:30   0.9

It converts the vector to a data.frame , with blank space used as separator and comma used as decimal, and then selects the first two columns. 它将向量转换为data.frame ,其中空格用作分隔符,逗号用作十进制,然后选择前两列。

Try: 尝试:

vec1<-sapply(strsplit(vector," "),"[")
df<-plyr::ldply(vec1,function(x) x[1:2])
names(df)<-c("hours","value")       
df$value<-gsub(",",".",df$value)

Result: 结果:

  hours value
1  0:00   0.6
2  5:00   1.2
3  9:30   0.9
4 16:00   1.0
5 21:30   0.9

Another fun solution is to use word from stringr package, ie 另一个有趣的解决方案是使用stringr包中的word ,即

library(stringr)
data.frame(hours = word(vector, 1), 
           values = as.numeric(sub(',', '.', word(vector, 2), fixed = TRUE)), 
           stringsAsFactors = FALSE)

which gives, 这使,

  hours values 1 0:00 0.6 2 5:00 1.2 3 9:30 0.9 4 16:00 1.0 5 21:30 0.9 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM