[英]Extract elements from different levels of a nested list
I have a nested list of academic authors such as: 我有一个嵌套的学术作者名单,例如:
> str(content)
List of 3
$ author-retrieval-response:List of 1
..$ :List of 6
.. ..$ @status : chr "found"
.. ..$ @_fa : chr "true"
.. ..$ coredata :List of 3
.. .. ..$ dc:identifier : chr "AUTHOR_ID:55604964500"
.. .. ..$ document-count: chr "6"
.. .. ..$ cited-by-count: chr "13"
.. ..$ h-index : chr "3"
.. ..$ coauthor-count: chr "7"
.. ..$ preferred-name:List of 2
.. .. ..$ surname : chr "García Cruz"
.. .. ..$ given-name: chr "Gustavo Adolfo"
$ author-retrieval-response:List of 1
..$ :List of 6
.. ..$ @status : chr "found"
.. ..$ @_fa : chr "true"
.. ..$ coredata :List of 3
.. .. ..$ dc:identifier : chr "AUTHOR_ID:56595713900"
.. .. ..$ document-count: chr "4"
.. .. ..$ cited-by-count: chr "21"
.. ..$ h-index : chr "3"
.. ..$ coauthor-count: chr "5"
.. ..$ preferred-name:List of 2
.. .. ..$ surname : chr "Akimov"
.. .. ..$ given-name: chr "Alexey"
$ author-retrieval-response:List of 1
..$ :List of 6
.. ..$ @status : chr "found"
.. ..$ @_fa : chr "true"
.. ..$ coredata :List of 3
.. .. ..$ dc:identifier : chr "AUTHOR_ID:12792624600"
.. .. ..$ document-count: chr "10"
.. .. ..$ cited-by-count: chr "117"
.. ..$ h-index : chr "6"
.. ..$ coauthor-count: chr "7"
.. ..$ preferred-name:List of 2
.. .. ..$ surname : chr "Alecke"
.. .. ..$ given-name: chr "Björn"
I am interested in extracting the following values: 我对提取以下值感兴趣:
dc:identifier, document-count, cited-by-count, h-index, coauthor-count, surname, given-name
dc:标识符,文档计数,按计数引用,h索引,合著者计数,姓氏,给定名称
And parsing them in a data-frame like structure. 并将其解析为类似数据框架的结构。
I have two issues: the first one is that I don't get to access to the different levels of my list. 我有两个问题:第一个问题是我无法访问列表的不同级别。 Indeed, while
content[[3]]
return the elements of the third sub-list/author, I have not found a way to access the sublists of the third author, that is: 的确,尽管
content[[3]]
返回第三个子列表/作者的元素,但我还没有找到访问第三个作者的子列表的方法,即:
> content[[3]][[2]]
Error in content[[3]][[2]] : subscript out of bounds
I also imagine that once I can access to it, I can not simply use sapply
as the elements I'd like to parse from my list are not at the same levels. 我也想像一下,一旦我可以访问它,就不能简单地使用
sapply
因为我想从列表中解析的元素不在同一级别。
I paste the dput
of the first three elements of my list: 我粘贴了
dput
中前三个元素的dput
:
structure(list(`author-retrieval-response` = list(structure(list(
`@status` = "found", `@_fa` = "true", coredata = structure(list(
`dc:identifier` = "AUTHOR_ID:55604964500", `document-count` = "6",
`cited-by-count` = "13"), .Names = c("dc:identifier",
"document-count", "cited-by-count")), `h-index` = "3", `coauthor-count` = "7",
`preferred-name` = structure(list(surname = "García Cruz",
`given-name` = "Gustavo Adolfo"), .Names = c("surname",
"given-name"))), .Names = c("@status", "@_fa", "coredata",
"h-index", "coauthor-count", "preferred-name"))), `author-retrieval-response` = list(
structure(list(`@status` = "found", `@_fa` = "true", coredata = structure(list(
`dc:identifier` = "AUTHOR_ID:56595713900", `document-count` = "4",
`cited-by-count` = "21"), .Names = c("dc:identifier",
"document-count", "cited-by-count")), `h-index` = "3", `coauthor-count` = "5",
`preferred-name` = structure(list(surname = "Akimov",
`given-name` = "Alexey"), .Names = c("surname", "given-name"
))), .Names = c("@status", "@_fa", "coredata", "h-index",
"coauthor-count", "preferred-name"))), `author-retrieval-response` = list(
structure(list(`@status` = "found", `@_fa` = "true", coredata = structure(list(
`dc:identifier` = "AUTHOR_ID:12792624600", `document-count` = "10",
`cited-by-count` = "117"), .Names = c("dc:identifier",
"document-count", "cited-by-count")), `h-index` = "6", `coauthor-count` = "7",
`preferred-name` = structure(list(surname = "Alecke",
`given-name` = "Björn"), .Names = c("surname", "given-name"
))), .Names = c("@status", "@_fa", "coredata", "h-index",
"coauthor-count", "preferred-name")))), .Names = c("author-retrieval-response",
"author-retrieval-response", "author-retrieval-response"))
Many thanks for your help! 非常感谢您的帮助!
Consider an rapply
(recursive apply function) to flatten all nested child and grandchild elements inside an lapply
that runs across the top three parent elements. 考虑一个
rapply
(递归应用函数)来展平跨越前三个父元素的lapply
中所有嵌套的子代和孙子元素。 Then transpose the result with t()
and pass it into a data.frame()
constructor call. 然后将结果与
t()
转置,并将其传递到data.frame()
构造函数调用中。
flat_list <- lapply(my_list, function(x) data.frame(t(rapply(x, function(x) x[1]))))
final_df <- do.call(rbind, unname(flat_list))
Output 输出量
final_df
# X.status X._fa coredata.dc.identifier coredata.document.count coredata.cited.by.count h.index coauthor.count preferred.name.surname preferred.name.given.name
# 1 found true AUTHOR_ID:55604964500 6 13 3 7 García Cruz Gustavo Adolfo
# 2 found true AUTHOR_ID:56595713900 4 21 3 5 Akimov Alexey
# 3 found true AUTHOR_ID:12792624600 10 117 6 7 Alecke Björn
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.