如何有效地将数据帧转换为任意长度的列表列表？

Question

I am trying to reshape a data frame for more efficient storage and retrieval. 我正在尝试重塑数据框，以便更有效地存储和检索。 Each row contains a "parent" (key) value, which is not unique between rows, and a child value (actually, a set of 3 attributes -- 1 character and 2 numeric). 每行包含一个“父”（键）值，它在行之间不是唯一的，而是一个子值（实际上，一组3个属性 - 1个字符和2个数字）。 I want to transform this data frame into a list that has just one top-level entry for each unique parent key, and a number of sub-lists as determined by the number of children associate with the parent. 我想将此数据帧转换为一个列表，该列表只包含每个唯一父键的一个顶级条目，以及由与父项关联的子项数确定的多个子列表。 Here are some sample data: 以下是一些示例数据：

pcm <- data.frame(parent = c("middle", "middle", "might", "might", 
                     "might", "million", "million", "millions"),
              child = c("of", "school", "be", "have", "not", "in", 
                     "to", "of"),
              count = c(476, 165, 1183, 619, 321, 490, 190, 269))

The output for this should be a list with 4 top-level elements (named "middle", "might", "million", "millions" ), and varying numbers of sub-lists with named members $child and $count (eg lookup4[["middle"]] contains sub-lists $children[[1]]$child = "of" , $count = 476 and $children[[2]]$child = "school" , $count = 165 ). 此输出应该是一个包含4个顶级元素（名为"middle", "might", "million", "millions" ）的列表，以及具有命名成员$child和$count不同数量的子列表（例如lookup4[["middle"]]包含子列表$children[[1]]$child = "of" ， $count = 476和$children[[2]]$child = "school" ， $count = 165 ）。

The code below works, but is extremely slow (several hours on a 300,000-row data frame using 8 GB RAM). 下面的代码有效，但速度非常慢（使用8 GB RAM的300,000行数据帧上有几个小时）。 I have imposed a limit of 6 on the number of children in the output data, but it doesn't seem to have made a big difference. 我对输出数据中的孩子数量施加了6的限制，但似乎并没有产生很大的不同。

lookup4 <- list()
parents <- unique(pcm$parent)
n.parents <- length(parents)
for (i in 1:n.parents) {
    words <- pcm$child[pcm$parent == parents[i]]
    counts <- pcm$count[pcm$parent == parents[i]]
    probs <- pcm$prob[pcm$parent == parents[i]]
    n.children <- min(c(NROW(words), 6)
    ngram.tail <- list()
    for (k in 1:n.children) {
        ngram.tail[[k]] <- list(word = words[k], 
        count = counts[k], 
        prob = probs[k])
    }
    lookup4[[parents[i]]] <- list(children = ngram.tail)
}

Could I speed it up by eliminating the 'for' loop? 我可以通过消除'for'循环加快速度吗？ If so, how would I code the transformation? 如果是这样，我将如何编码转换？

Answer 1

Try this: 尝试这个：

I suppose that the dataframe is called parents : 我想数据框称为parents ：

parents.list <- as.list(as.data.frame(t(parents)))

If you want the row names of parents to be the names of the list: 如果您希望父项的行名称是列表的名称：

parents.list <- setNames(split(parents, seq(nrow(parents))), rownames(parents))

如何有效地将数据帧转换为任意长度的列表列表？

问题描述

1 个解决方案

解决方案1
0 2016-04-06 06:45:11

如何有效地将数据帧转换为任意长度的列表列表？

问题描述

1 个解决方案

解决方案1 0 2016-04-06 06:45:11

解决方案1
0 2016-04-06 06:45:11