简体   繁体   English

从嵌套列表的不同级别提取元素

[英]Extract elements from different levels of a nested list

I have a nested list of academic authors such as: 我有一个嵌套的学术作者名单,例如:

> str(content)
List of 3
 $ author-retrieval-response:List of 1
  ..$ :List of 6
  .. ..$ @status       : chr "found"
  .. ..$ @_fa          : chr "true"
  .. ..$ coredata      :List of 3
  .. .. ..$ dc:identifier : chr "AUTHOR_ID:55604964500"
  .. .. ..$ document-count: chr "6"
  .. .. ..$ cited-by-count: chr "13"
  .. ..$ h-index       : chr "3"
  .. ..$ coauthor-count: chr "7"
  .. ..$ preferred-name:List of 2
  .. .. ..$ surname   : chr "García Cruz"
  .. .. ..$ given-name: chr "Gustavo Adolfo"
 $ author-retrieval-response:List of 1
  ..$ :List of 6
  .. ..$ @status       : chr "found"
  .. ..$ @_fa          : chr "true"
  .. ..$ coredata      :List of 3
  .. .. ..$ dc:identifier : chr "AUTHOR_ID:56595713900"
  .. .. ..$ document-count: chr "4"
  .. .. ..$ cited-by-count: chr "21"
  .. ..$ h-index       : chr "3"
  .. ..$ coauthor-count: chr "5"
  .. ..$ preferred-name:List of 2
  .. .. ..$ surname   : chr "Akimov"
  .. .. ..$ given-name: chr "Alexey"
 $ author-retrieval-response:List of 1
  ..$ :List of 6
  .. ..$ @status       : chr "found"
  .. ..$ @_fa          : chr "true"
  .. ..$ coredata      :List of 3
  .. .. ..$ dc:identifier : chr "AUTHOR_ID:12792624600"
  .. .. ..$ document-count: chr "10"
  .. .. ..$ cited-by-count: chr "117"
  .. ..$ h-index       : chr "6"
  .. ..$ coauthor-count: chr "7"
  .. ..$ preferred-name:List of 2
  .. .. ..$ surname   : chr "Alecke"
  .. .. ..$ given-name: chr "Björn"

I am interested in extracting the following values: 我对提取以下值感兴趣:

dc:identifier, document-count, cited-by-count, h-index, coauthor-count, surname, given-name dc:标识符,文档计数,按计数引用,h索引,合著者计数,姓氏,给定名称

And parsing them in a data-frame like structure. 并将其解析为类似数据框架的结构。

I have two issues: the first one is that I don't get to access to the different levels of my list. 我有两个问题:第一个问题是我无法访问列表的不同级别。 Indeed, while content[[3]] return the elements of the third sub-list/author, I have not found a way to access the sublists of the third author, that is: 的确,尽管content[[3]]返回第三个子列表/作者的元素,但我还没有找到访问第三个作者的子列表的方法,即:

> content[[3]][[2]]
Error in content[[3]][[2]] : subscript out of bounds

I also imagine that once I can access to it, I can not simply use sapply as the elements I'd like to parse from my list are not at the same levels. 我也想像一下,一旦我可以访问它,就不能简单地使用sapply因为我想从列表中解析的元素不在同一级别。

I paste the dput of the first three elements of my list: 我粘贴了dput中前三个元素的dput

structure(list(`author-retrieval-response` = list(structure(list(
    `@status` = "found", `@_fa` = "true", coredata = structure(list(
        `dc:identifier` = "AUTHOR_ID:55604964500", `document-count` = "6", 
        `cited-by-count` = "13"), .Names = c("dc:identifier", 
    "document-count", "cited-by-count")), `h-index` = "3", `coauthor-count` = "7", 
    `preferred-name` = structure(list(surname = "García Cruz", 
        `given-name` = "Gustavo Adolfo"), .Names = c("surname", 
    "given-name"))), .Names = c("@status", "@_fa", "coredata", 
"h-index", "coauthor-count", "preferred-name"))), `author-retrieval-response` = list(
    structure(list(`@status` = "found", `@_fa` = "true", coredata = structure(list(
        `dc:identifier` = "AUTHOR_ID:56595713900", `document-count` = "4", 
        `cited-by-count` = "21"), .Names = c("dc:identifier", 
    "document-count", "cited-by-count")), `h-index` = "3", `coauthor-count` = "5", 
        `preferred-name` = structure(list(surname = "Akimov", 
            `given-name` = "Alexey"), .Names = c("surname", "given-name"
        ))), .Names = c("@status", "@_fa", "coredata", "h-index", 
    "coauthor-count", "preferred-name"))), `author-retrieval-response` = list(
    structure(list(`@status` = "found", `@_fa` = "true", coredata = structure(list(
        `dc:identifier` = "AUTHOR_ID:12792624600", `document-count` = "10", 
        `cited-by-count` = "117"), .Names = c("dc:identifier", 
    "document-count", "cited-by-count")), `h-index` = "6", `coauthor-count` = "7", 
        `preferred-name` = structure(list(surname = "Alecke", 
            `given-name` = "Björn"), .Names = c("surname", "given-name"
        ))), .Names = c("@status", "@_fa", "coredata", "h-index", 
    "coauthor-count", "preferred-name")))), .Names = c("author-retrieval-response", 
"author-retrieval-response", "author-retrieval-response"))

Many thanks for your help! 非常感谢您的帮助!

Consider an rapply (recursive apply function) to flatten all nested child and grandchild elements inside an lapply that runs across the top three parent elements. 考虑一个rapply (递归应用函数)来展平跨越前三个父元素的lapply中所有嵌套的子代和孙子元素。 Then transpose the result with t() and pass it into a data.frame() constructor call. 然后将结果与t()转置,并将其传递到data.frame()构造函数调用中。

flat_list <- lapply(my_list, function(x) data.frame(t(rapply(x, function(x) x[1]))))

final_df <- do.call(rbind, unname(flat_list))

Output 输出量

final_df

#   X.status X._fa coredata.dc.identifier coredata.document.count coredata.cited.by.count h.index coauthor.count preferred.name.surname preferred.name.given.name
# 1    found  true  AUTHOR_ID:55604964500                       6                      13       3              7            García Cruz            Gustavo Adolfo
# 2    found  true  AUTHOR_ID:56595713900                       4                      21       3              5                 Akimov                    Alexey
# 3    found  true  AUTHOR_ID:12792624600                      10                     117       6              7                 Alecke                     Björn

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM