简体   繁体   English

如何从R中的数据框中提取高数值

[英]How to extract the high numeric values from data frame in R

I am presenting a small data frame here that is from model output file and I extracted the required parameters time and WatBlar and converted it into data frame. 我在这里展示了一个小的数据帧,该数据帧来自模型输出文件,我提取了所需的参数timeWatBlar并将其转换为数据帧。 Complete code starts from here. 完整的代码从这里开始。

library(stringr)

x <- readLines("G:/Rlearning/Mohsin-FM/Balance.out")

a <- grep("[T]", x, value = T)
b <- grep("Time", a , value = T)

c <-  b[-c(1,2)]
d <- grep("WatBalR", x, value = T)

The data is like that 数据就是这样

data <- data.frame(time =c, watbalr = d)

> data


                         time                          watbalr
1  Time       [T]        3.0000  WatBalR  [%]              0.040
2  Time       [T]        6.0000  WatBalR  [%]              0.024
3  Time       [T]        9.0000  WatBalR  [%]              0.044
4  Time       [T]       30.0000  WatBalR  [%]              0.034

I checked the data class it is data frame that is shown below. 我检查了数据类,它是如下所示的数据框。

> c
[1] " Time       [T]        3.0000" " Time       [T]        6.0000"
[3] " Time       [T]        9.0000" " Time       [T]       30.0000"

> class(c)
[1] "character" 



> d
[1] " WatBalR  [%]              0.040" " WatBalR  [%]              0.024"
[3] " WatBalR  [%]              0.044" " WatBalR  [%]              0.034"

> class(d)
[1] "character"

> class(data)
[1] "data.frame"

The code to extract the required values is written as shown below. 提取所需值的代码如下所示。 But it is just assigning the value of the time 0 to 9, any value above than 9 it just start it again 0 to 9. 但是它只是将时间0的值赋给9,任何大于9的值都将它重新赋值0到9。

times   <- sub("^.+?(\\d)", "\\1", c)
WatBlaR <- sub("^.+?(\\d)", "\\1", d)

times   <- as.numeric(times)
WatBlaR <- as.numeric(WatBlaR)

# plot 
plot(x = times, y = WatBlaR)

The results for 4 values as mentioned above in data frame are shown below. 数据框中如上所述的4个值的结果如下所示。

> times
[1] 3 6 9 0

But the required results for time are 但是时间要求的结果是

3, 6, 9, 30

When I want to extract the model data from the daily basis data it present the values as 当我想从每日数据中提取模型数据时,将值显示为

> times    
0,1,2,3,4,5,6,7,8,9, 0,1,2,3,4,5,6,7,8,9, 0,1,2,3,4,5,6,7,8,9

It just followed the sequence of all the time available 0 to 9, the required out should be like that 它只是遵循所有可用时间0到9的顺序,所需的输出应该像这样

> times
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30

So you are trying to extract values from a character vector in R. The base string functions are not as rich as might be desired for these kinds of situations. 因此,您正在尝试从R中的字符向量中提取值。基本字符串函数并不像在这种情况下那样丰富。 Consider adding stringr which is very handy for this kind of work. 考虑添加stringr ,这对于此类工作非常方便。

library(stringr)

# I will create a toy df
df <- data.frame(A=c(1,2,16,5), B=c(0.1, 0.4, 0.6, 0.8), C=c('3.0000  WatBalR', '3.0000  WatBalR', '12.0000  WatBalR', '6.0000  WatBalR'),
            stringsAsFactors = FALSE)

# now I can extract with a simple regex pattern
times <- as.numeric(str_extract(df$C, '^[0-9]+'))

Here we use str_extract to extract with a signature of (data on which to operate, regex pattern) . 在这里,我们使用str_extract提取签名(data on which to operate, regex pattern) We are also using $ to name the column in the data.frame and make it more legible, so we can pass what we need. 我们还使用$命名data.frame中的列,并使其更清晰易读,因此我们可以传递所需的内容。

I can also get the doubles easily: 我还可以轻松获得双打:

watblar <- as.double(str_extract(df$B, '^[.0-9]+'))

and types are correct 和类型是正确的

> str(times)
 num [1:4] 3 3 12 6
> str(watblar)
 num [1:4] 0.1 0.4 0.6 0.8

You can extract the numbers using sub from the base package where you look for a pattern of the form 您可以使用基本程序包中的sub提取数字,在其中寻找格式的模式

  • any number of digits followed by 任何数字后跟
  • a dot (optional) followed by 点(可选)后跟
  • any number of digits (optional) 任意位数(可选)

This is how you could do it: 这是您可以执行的操作:

library(magrittr)   ## For pipe %>%

# Some sample data
data <- data.frame(time = c(" Time       [T]        3.0000", 
                " Time       [T]        6.0000",
                " Time       [T]        9.0000", 
                " Time       [T]       30.0000"),
        watbalr = c(" WatBalR  [%]              0.040", 
                " WatBalR  [%]              0.024", 
                " WatBalR  [%]              0.044", 
                " WatBalR  [%]              0.034"),    stringsAsFactors = FALSE)

## Extract pattern and convert to numeric:
times <- sub("[^[:digit:]]*(\\d+\\.?\\d*).*", "\\1", data$time) %>%
        as.numeric
WatBalR  <- sub("[^[:digit:]]*(\\d+\\.?\\d*).*", "\\1", data$watbalr) %>%
        as.numeric

> times
# [1]  3  6  9 30
> WatBalR
# [1] 0.040 0.024 0.044 0.034

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM