简体   繁体   English

拆分列表元素扩展列表

[英]splitting list elements expanding the list

I'm doing some kind of optical character recognition and face the following issue. 我正在进行某种光学字符识别,并且遇到以下问题。 I store the glyphs in a list of binary matrices and they can be of different size, but their maximum possible width is wid = 3 columns (may be any defined constant, not just 3). 我将字形存储在二进制矩阵列表中,它们的大小可以不同,但​​它们的最大可能宽度为wid = 3列(可以是任何定义的常数,而不仅仅是3个)。 In some cases after the first stage of processing I get data which look like this: 在某些情况下,经过第一阶段的处理后,我得到的数据如下所示:

myll <- list(matrix(c(0, 0, 0, 1, 1, 0), ncol = 2),
             matrix(c(0), ncol = 1),
             matrix(c(1, 1, 0), ncol = 3),
             matrix(c(1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1), ncol = 7),
             matrix(c(1, 1, 1, 1), ncol = 2))
# [[1]]
#      [,1] [,2]
# [1,]    0    1
# [2,]    0    1
# [3,]    0    0
# 
# [[2]]
#      [,1]
# [1,]    0
# 
# [[3]]
#      [,1] [,2] [,3]
# [1,]    1    1    0
# 
# [[4]]
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,]    1    1    1    0    0    0    1
# [2,]    0    1    0    1    0    0    1
# [3,]    1    1    1    1    0    0    1
# 
# [[5]]
#      [,1] [,2]
# [1,]    1    1
# [2,]    1    1

So, some glyphs may be not separated for some reasons. 因此,某些字形可能由于某些原因而无法分离。 This happens only with glyphs of maximum possible width. 仅在最大可能宽度的字形中会发生这种情况。 Moreover, there may be some junk at the end of the matrix. 此外,矩阵末尾可能会有一些垃圾。 I have to split them into matrices of width ncol = wid leaving the last piece (junk) as is. 我必须将它们拆分为宽度为ncol = wid矩阵,而最后一块(垃圾)保持原样。 Then I store this matrices in separate elements of list to get the following output: 然后,将此矩阵存储在list的单独元素中,以获取以下输出:

# [[1]]
#      [,1] [,2]
# [1,]    0    1
# [2,]    0    1
# [3,]    0    0
# 
# [[2]]
#      [,1]
# [1,]    0
# 
# [[3]]
#      [,1] [,2] [,3]
# [1,]    1    1    0
# 
# [[4]]
#      [,1] [,2] [,3]
# [1,]    1    1    1
# [2,]    0    1    0
# [3,]    1    1    1
# 
# [[5]]
#      [,1] [,2] [,3]
# [1,]    0    0    0
# [2,]    1    0    0
# [3,]    1    0    0
# 
# [[6]]
#      [,1]
# [1,]    1
# [2,]    1
# [3,]    1
# 
# [[7]]
#      [,1] [,2]
# [1,]    1    1
# [2,]    1    1

At the moment I can make it with the help of this functions 目前,我可以借助此功能

checkGlyphs <- function(gl_m, wid = 3) {
  if (ncol(gl_m) > wid) 
    return(list(gl_m[,1:wid], matrix(gl_m[,-(1:wid)], nrow = nrow(gl_m)))) else
    return(gl_m)
}

separateGlyphs <- function(myll, wid = 3) {
  require("magrittr")

  presplit <- lapply(myll, checkGlyphs, wid) 
  total_new_length <- 
    presplit[unlist(lapply(presplit, is.list))] %>% lapply(length) %>% unlist() %>% sum() +
    as.integer(!unlist(lapply(presplit, is.list))) %>% sum()

  splitted <- vector("list", length = total_new_length)
  spl_index <- 1
  for (i in 1:length(presplit)) 
  {
    if (!is.list(presplit[[i]])) 
    {   
      splitted[[spl_index]] <- presplit[[i]]
      spl_index <- spl_index + 1 
    } else
    { 
      for (j in 1:length(presplit[[i]]))
      {   
        splitted[[spl_index]] <- presplit[[i]][[j]]
        spl_index <- spl_index + 1 
      }
    }
  }

  if (any(lapply(splitted, ncol) > wid)) return(separateGlyphs(splitted, wid)) else
    return(splitted)
}

But I believe there is more fast and convenient way to achieve the same result (without using for loops and this enlooped reassignment of elements and then recursion if needed O_o). 但是我相信有一种更快,更方便的方法来达到相同的结果(无需使用for循环和元素的这种循环重新分配,然后在需要时进行递归O_o)。

I will be thankful for any suggestions on the point or, alternatively, for recommending some OCR-packages for R. 对于这一点上的任何建议,或者对于R推荐一些OCR软件包,我将不胜感激。

This should do the trick, with the values in final being what you're after. 这应该可以解决问题, final的值就是您要追求的。

combined <- do.call(cbind, lapply(myll, unlist))
idx <- seq(1, ncol(combined), 2)
final <- do.call(list, lapply(idx, function(x) combined[, x:(x+1)]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM