[英]How to extract the high numeric values from data frame in R
I am presenting a small data frame here that is from model output file and I extracted the required parameters time
and WatBlar
and converted it into data frame. 我在这里展示了一个小的数据帧,该数据帧来自模型输出文件,我提取了所需的参数
time
和WatBlar
并将其转换为数据帧。 Complete code starts from here. 完整的代码从这里开始。
library(stringr)
x <- readLines("G:/Rlearning/Mohsin-FM/Balance.out")
a <- grep("[T]", x, value = T)
b <- grep("Time", a , value = T)
c <- b[-c(1,2)]
d <- grep("WatBalR", x, value = T)
The data is like that 数据就是这样
data <- data.frame(time =c, watbalr = d)
> data
time watbalr
1 Time [T] 3.0000 WatBalR [%] 0.040
2 Time [T] 6.0000 WatBalR [%] 0.024
3 Time [T] 9.0000 WatBalR [%] 0.044
4 Time [T] 30.0000 WatBalR [%] 0.034
I checked the data class it is data frame that is shown below. 我检查了数据类,它是如下所示的数据框。
> c
[1] " Time [T] 3.0000" " Time [T] 6.0000"
[3] " Time [T] 9.0000" " Time [T] 30.0000"
> class(c)
[1] "character"
> d
[1] " WatBalR [%] 0.040" " WatBalR [%] 0.024"
[3] " WatBalR [%] 0.044" " WatBalR [%] 0.034"
> class(d)
[1] "character"
> class(data)
[1] "data.frame"
The code to extract the required values is written as shown below. 提取所需值的代码如下所示。 But it is just assigning the value of the time 0 to 9, any value above than 9 it just start it again 0 to 9.
但是它只是将时间0的值赋给9,任何大于9的值都将它重新赋值0到9。
times <- sub("^.+?(\\d)", "\\1", c)
WatBlaR <- sub("^.+?(\\d)", "\\1", d)
times <- as.numeric(times)
WatBlaR <- as.numeric(WatBlaR)
# plot
plot(x = times, y = WatBlaR)
The results for 4 values as mentioned above in data frame are shown below. 数据框中如上所述的4个值的结果如下所示。
> times
[1] 3 6 9 0
But the required results for time are 但是时间要求的结果是
3, 6, 9, 30
When I want to extract the model data from the daily basis data it present the values as 当我想从每日数据中提取模型数据时,将值显示为
> times
0,1,2,3,4,5,6,7,8,9, 0,1,2,3,4,5,6,7,8,9, 0,1,2,3,4,5,6,7,8,9
It just followed the sequence of all the time available 0 to 9, the required out should be like that 它只是遵循所有可用时间0到9的顺序,所需的输出应该像这样
> times
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
So you are trying to extract values from a character vector in R. The base string functions are not as rich as might be desired for these kinds of situations. 因此,您正在尝试从R中的字符向量中提取值。基本字符串函数并不像在这种情况下那样丰富。 Consider adding
stringr
which is very handy for this kind of work. 考虑添加
stringr
,这对于此类工作非常方便。
library(stringr)
# I will create a toy df
df <- data.frame(A=c(1,2,16,5), B=c(0.1, 0.4, 0.6, 0.8), C=c('3.0000 WatBalR', '3.0000 WatBalR', '12.0000 WatBalR', '6.0000 WatBalR'),
stringsAsFactors = FALSE)
# now I can extract with a simple regex pattern
times <- as.numeric(str_extract(df$C, '^[0-9]+'))
Here we use str_extract
to extract with a signature of (data on which to operate, regex pattern)
. 在这里,我们使用
str_extract
提取签名(data on which to operate, regex pattern)
。 We are also using $
to name the column in the data.frame and make it more legible, so we can pass what we need. 我们还使用
$
命名data.frame中的列,并使其更清晰易读,因此我们可以传递所需的内容。
I can also get the doubles easily: 我还可以轻松获得双打:
watblar <- as.double(str_extract(df$B, '^[.0-9]+'))
and types are correct 和类型是正确的
> str(times)
num [1:4] 3 3 12 6
> str(watblar)
num [1:4] 0.1 0.4 0.6 0.8
You can extract the numbers using sub
from the base package where you look for a pattern of the form 您可以使用基本程序包中的
sub
提取数字,在其中寻找格式的模式
This is how you could do it: 这是您可以执行的操作:
library(magrittr) ## For pipe %>%
# Some sample data
data <- data.frame(time = c(" Time [T] 3.0000",
" Time [T] 6.0000",
" Time [T] 9.0000",
" Time [T] 30.0000"),
watbalr = c(" WatBalR [%] 0.040",
" WatBalR [%] 0.024",
" WatBalR [%] 0.044",
" WatBalR [%] 0.034"), stringsAsFactors = FALSE)
## Extract pattern and convert to numeric:
times <- sub("[^[:digit:]]*(\\d+\\.?\\d*).*", "\\1", data$time) %>%
as.numeric
WatBalR <- sub("[^[:digit:]]*(\\d+\\.?\\d*).*", "\\1", data$watbalr) %>%
as.numeric
> times
# [1] 3 6 9 30
> WatBalR
# [1] 0.040 0.024 0.044 0.034
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.