简体   繁体   中英

Convert Stem and Leaf to a Vector in R Efficiently

I need to verify summary statistics (mean, standard deviation etc.) for many stem and leaf plots so I have written some functions to try and convert a stem and leaf plot into a vector since it is easy to obtain statistics from vectors in R.

A stem and leaf plot can be entered as a matrix or a data frame where each row is a string. The "|" symbol represents the separator for a decimal place. For example, the stem and leaf plot below

100 | 9
102 | 601
104 | 0678
106 | 5
108 | 649
110 | 3857
112 | 56
114 | 29

can be entered as

> example.stem = rbind("100|9", "102|601", "104|0678", "106|5", "108|649", "110|3857", "112|56", "114|29")

My two functions that perform the conversion of this stem and leaf plot are

## Convert a single row into a vector
> convert.row = function(current){

  temp.split = as.vector(strsplit(current, split="|", fixed=TRUE)[[1]])

  int = temp.split[1]

  dec = temp.split[2]
  dec = (strsplit(dec, ""))[[1]]

  temp.string = NULL

  for(i in 1:length(dec)){
    temp.string[i] = paste(int, dec[i], sep=".")
  }

  result = as.numeric(temp.string)
  return(result)
  }


## Convert matrix or dataframe with a stem and leaf plot into a vector
> stem.to.vec = function(df){  

  df = data.frame(df, stringsAsFactors = F)  

  result.vec = NULL

  for(i in 1:nrow(df)){
    current = df[i, ]
    result.vec = c(result.vec, convert.row(current))
  }

  return(result.vec)
  }    

We can verify this works because we know the solution:

> solution = c(100.9, 102.6,102.0,102.1,104.0,104.6,104.7,104.8,106.5,108.6,108.4,108.9,110.3,110.8,110.5,110.7,112.5,112.6, 114.2, 114.9)
> stem.to.vec(example.stem) == solution

Although this solution works, it is not elegant or efficient. We are converting a matrix/data frame with strings into a numeric value, then back to a string and then into a numeric value again. Therefore it can be slow to work for very large stem and leaf plots.

Can anyone suggest a better and more efficient solution with fewer conversions?

This is far from pretty, but I think you're going to have to do some back and forth conversion anyway.

Use read.table to suck the data in, then divide the right hand side by 10 and add to each value on the left side.

out <- read.table(text=example.stem, sep="|", colClasses=c("numeric","character"))
res <- unlist(Map(`+`, out$V1, lapply(strsplit(out$V2,""), function(x) as.numeric(x)/10)))
res
# [1] 100.9 102.6 102.0 102.1 104.0 104.6 104.7 104.8 106.5 108.6 108.4 108.9
# [13] 110.3 110.8 110.5 110.7 112.5 112.6 114.2 114.9

identical(solution,res)
#[1] TRUE

Split and paste approach:

Split into list items and then split the second element of list items. Finally paste the two vectors in a list item.

x <- sapply(strsplit(example.stem, "[|]"), 
            function(x) { paste(x[1], unlist(strsplit(x[2], "")), sep= ".") })

as.numeric(unlist(x))                           

# [1] 100.9 102.6 102.0 102.1 104.0 104.6 104.7 104.8 106.5 108.6 108.4 108.9 
# [13] 110.3 110.8 110.5 110.7 112.5 112.6 114.2 114.9

Here is a nested lapply version that does all the combining as strings, then converts the output value to numeric:

out <- unlist(lapply(strsplit(example.stem, "[|]"), function(x){
  lapply(x[2], function(y){
    as.numeric(paste(x[1], unlist(strsplit(y, "")), sep = "."))
  })
}))

> identical(solution, out)
[1] TRUE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM