简体   繁体   English

R在嵌套列表的数字集中重复单个数字

[英]R repeat individual numbers within numeric sets in nested list

I have a set of alpha-numeric vectors: 我有一组字母数字向量:

lst <- list(c("三垣3-19", "6", "81497", "79992", "79101", 
"77760", "75973", "75411", "74666"), c("蒼龍1-01", "2", "66249", "65474", "66803", "64238"), c("蒼龍1-02", "1", "64238"), "蒼龍1-03")

[[1]]
[1] "三垣3-19" "6"        "81497"    "79992"   
[5] "79101"    "77760"    "75973"    "75411"   
[9] "74666"   

[[2]]
[1] "蒼龍1-01" "2"        "66249"    "65474"   
[5] "66803"    "64238"   

[[3]]
[1] "蒼龍1-02" "1"        "64238"   

[[4]]
[1] "蒼龍1-03"

The second number on each vector (ie 6,2,1) represents the total number of lines to be drawn to connect stars, given by their HIP number to the right, together. 每个向量上的第二个数字(即6,2,1)代表要绘制的用于连接恒星的线的总数,由右侧的HIP数给出。 Each pair of HIP number indicates a line drawn between 2 stars. 每对HIP编号表示2星之间的线。

Hence 81497 79992 in [[1]] would mean "draw a line between star number "81497" and "79992", so on and so forth. 因此, [[1]] 81497 79992意味着“在星号“ 81497”和“ 79992”之间划一条线,依此类推。

In the case of a continuous line, such as [[1]] , the numbers between "81497" and "74666" should be repeated so that there is no break in the lines. 如果是连续的行,例如[[1]] ,则应重复“ 81497”和“ 74666”之间的数字,以使行中没有中断。

Thus, in the case of [[1]] , "79992" "79101" "77760" "75973" "75411" should be repeated to give the following result: 因此,在[[1]]的情况下,应重复执行"79992" "79101" "77760" "75973" "75411"以得到以下结果:

[[1]]
 [1] "三垣3-19" "6"        "81497"    "79992"   
 [5] "79992"    "79101"    "79101"    "77760"   
 [9] "77760"    "75973"    "75973"    "75411"   
[13] "75411"    "74666"   

[[2]]
[1] "蒼龍1-01" "2"        "66249"    "65474"   
[5] "66803"    "64238"   

[[3]]
[1] "蒼龍1-02" "1"        "64238"    "64238"   

[[4]]
[1] "蒼龍1-03"

Since the second element on each list represents the total number of lines to be drawn, a validity test can be coded to indicate whether certain numbers need to be repeated. 由于每个列表中的第二个元素代表要绘制的线条总数,因此可以对有效性测试进行编码,以指示是否需要重复某些数字。 Thus 6 in [[1]] means there should be 6 pairs (ie 6 * 2 = 12 elements) of HIP numbers that follow. 因此6[[1]]表示应该有6对遵循HIP号码(即6 * 2 = 12个元素)。 When the validity test fails, I would like R to repeat the numbers in between the third and final elements for me so that the continuous line can be drawn. 当有效性测试失败时,我希望R为我重复第三个元素与最后一个元素之间的数字,以便可以绘制连续线。


The partial solution I managed to cobble up is as follows: 我设法解决的部分解决方案如下:

lapply(lst, function(x) x[2]) == (lengths(lst)-2)/2
[1] FALSE  TRUE FALSE    NA

This tests the HIP values for its validity. 这将测试HIP值的有效性。 Only [[2]] fits into the description in the original list. 原始列表中仅[[2]]符合说明。 [[1]] and [[3]] would be the vectors we need to work on. [[1]][[3]]将是我们需要处理的向量。

To repeat individual values in-between a certain vector, I could do this: 要在某个向量之间重复单个值,我可以这样做:

> x <- c(1,2,3,4,5)
> x[2:4] <- lapply(x[2:4], function(x) rep(x, 2))
> unlist(x)
[1] 1 2 2 3 3 4 4 5

However, because lst is a list, I cannot do: 但是,因为lst是一个列表,所以我不能这样做:

lst[2:4] <- lapply(lst[2:4], function(x) rep(x, 2))

to get the same results. 获得相同的结果。 The fact that the end number (4, in this case) needs to be specified by lengths(lst) further complicates the matter. 结束号(在这种情况下为4)需要由lengths(lst)指定的事实使问题更加复杂。

I suppose the final code would be an ifelse() function to join the two functions described above. 我想最终的代码将是一个ifelse()函数,以连接上述两个函数。


Clarification of the rule: 澄清规则:

The second element of each vector represents the desired number of distinct HIP pairs to draw a line. 每个向量的第二个元素代表画一条线所需的不同HIP对数。

[[2]] is valid because there are 2 pairs of numbers that follow, which fits the value given in its second element, so the numbers need not be repeated. [[2]]是有效的,因为后面紧跟着两对数字,这些数字与第二个元素中给出的值匹配,因此不需要重复数字。

In this case, the lines most probably form a cross, rather than a continuous line. 在这种情况下,这些线很可能形成十字线,而不是连续线。 So the rule should be applied only in the case of a continuous line, such as [[1]] . 因此,该规则应仅在连续线的情况下适用,例如[[1]]

As for the case of [[3]], because there is only one point, the number is repeated as a rule, so that the validity given by the second element is sustained. 对于[[3]]的情况,因为只有一个点,所以通常重复该数字,从而维持第二元素给出的有效性。


BUG INQUIRY 错误查询

@TUSHAr: Your code seems to generate NA values when elements within the vectors contain non-numeric values. @TUSHAr:当向量中的元素包含非数字值时,您的代码似乎会生成NA值。

lst <- list(c("三垣3-19", "6", "81497", "79992A", "79101", 
              "77760", "75973A", "75411", "74666"), c("蒼龍1-01", "2", "66249", "65474", "66803B", "64238"), c("蒼龍1-02", "1", "64238"), "蒼龍1-03")

Run the code with the above data and you get: 使用上面的数据运行代码,您将获得:

[[1]]
 [1] "三垣3-19" "6"        "81497"    NA         NA        
 [6] "79101"    "79101"    "77760"    "77760"    NA        
[11] NA         "75411"    "75411"    "74666"   

[[2]]
[1] "蒼龍1-01" "2"        "66249"    "65474"    NA        
[6] "64238"   

[[3]]
[1] "蒼龍1-02" "1"        "64238"    "64238"   

[[4]]
[1] "蒼龍1-03"

What is causing this, and is there a way to fix it? 是什么原因造成的,有办法解决吗?

Storing the first value of each vector in lst in a separate variable id to avoid unnecessary subsetting during processing. 存储每个的所述第一值vectorlst在一个单独的变量id ,以避免在处理过程中不必要的子集。

id = lapply(lst,function(t){t[1]})

Removed the first element which is already stored in id . 删除了已经存储在id的第一个元素。

lst = lapply(lst,function(t){
    t=t[-1]
    #if(length(t)>0){
    #    as.integer(t)
    #}
})

Loop through the processed lst object: 通过经处理的循环lst对象:

temp = lapply(lst,function(t){
#Use the first value as the desired number of pairs in `reqdpairs`
    reqdpairs = as.numeric(t[1])
#remove the first values so that `t` only contains HIP numbers.
    t=t[-1]
#calculate existing number of pairs for case [[2]] such that if all conditions are satisfied we don't do any processing 
    noofpairs = floor(length(t)/2)
#check if `t` contains values after removing the first element. The `else` part covers the case [[3]]
    if(length(t)>1){
#If `noofpairs` is not equal to `reqdpairs` use `rep` on the inner elements (**excluding the first and last element**) of the vector.
        if(noofpairs!=reqdpairs){
            pairs=c(reqdpairs,t[1],rep(t[-c(1,length(t))],each=2),t[length(t)])
        }else{
#In this case no processing is required so we just merge the reqdpairs with `t` as it is
            pairs=c(reqdpairs,t)
        }
    }else if(length(t)==1){
        pairs=rep(t[1],times=2) 
        pairs=c(reqdpairs,pairs)
    }else{
        pairs=NULL
    }
    pairs=as.character(pairs)
}
)

This step is to merge id with temp to achieve the desired output format. 此步骤是将idtemp合并以获得所需的输出格式。 Basically just a concatenation step. 基本上只是一个串联步骤。

mapply(function(x,y){c(x,y)},id,temp)


#[[1]]
#[1] "三垣3-19" "6"        "81497"    "79992"    "79992"    "79101"    "79101"    "77760"    "77760"    "75973"   
#[11] "75973"    "75411"    "75411"    "74666"   

#[[2]]
#[1] "蒼龍1-01" "2"        "66249"    "65474"    "66803"    "64238"   

#[[3]]
#[1] "蒼龍1-02" "1"        "64238"    "64238"   

#[[4]]
#[1] "蒼龍1-03"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM