简体   繁体   English

R 在 2 个列表中的匹配元素上迭代一个函数

[英]R iterate a function over the matching elements in 2 lists

So I'm trying to perform Kmeans clustering on each element(dataframe) in a list, from the outputs of the kmeans clustering, I took the "centers" that matches each data frame and bind all the centers into another list.所以我试图对列表中的每个元素(数据帧)执行 Kmeans 聚类,从 kmeans 聚类的输出中,我取了与每个数据帧匹配的“中心”并将所有中心绑定到另一个列表中。

Next, what I want to do is to use the function get.knnx(), so I can use each centers generated by kmeans clustering and with that going back to the original data frame to sample 500 data points that are the closest to the centre, to achieve a good subsampling of the data.接下来,我想要做的是使用函数 get.knnx(),这样我就可以使用 kmeans 聚类生成的每个中心,然后返回到原始数据帧,对距离中心最近的 500 个数据点进行采样, 以实现对数据的良好子采样。 (The reason I did not use the kmeans cluster membership assigned is because the data to perform the kmeans is just a subsampling of the original dataset for training) (我没有使用分配的kmeans集群成员的原因是因为执行kmeans的数据只是用于训练的原始数据集的子采样)

Each dataframe has the same structure: many rows of samples and 107 columns of variables, but the 1st and 2nd columns are just data labels such as the actual drug treatment.每个数据框都有相同的结构:多行样本和 107 列变量,但第 1 和第 2 列只是数据标签,例如实际药物治疗。

Here is the link towards 2 sample data https://drive.google.com/drive/folders/1B8JQY94Z-BHTZEKlV4dvUDocmiyppBDa?usp=sharing这是指向 2 个示例数据的链接https://drive.google.com/drive/folders/1B8JQY94Z-BHTZEKlV4dvUDocmiyppBDa?usp=sharing

library(tidyverse)
library(purr)
#take data into list
mylist <- list(df1,df2,df3...)

#perform Kmeans cluster
#scale datainput and drop the data label column
Kmeans.list <- map(.x = mylist,
               .f = ~kmeans(scale(.x[,-c(1:2)]),
                            centers =15,
                            nstart=50,
                            iter.max = 100)) %>% 
                purrr::set_names(c("df1", "df2"))

#Isolate the Centers info to another list
 Kmeans_centers <- map(Kmeans.list, ~.x$centers)

#trying to use map2
y <- map2(.x = mylist,.y=Kmeans_centers,
     .f=~get.knnx(scale(.x[,-c(1:2)],.y, 500)))

Thanks to the help from legends on Stackoverflow, I was manage to make the kmeans work and get the centers list.感谢 Stackoverflow 上传说的帮助,我设法使 kmeans 工作并获得中心列表。 Now I want to use the same logic to use map2()现在我想使用相同的逻辑来使用 map2()

Now the error I get from map2 is "Error in scale.default(.x[, -c(1:2)], .y, 500) : length of 'center' must equal the number of columns of 'x'"现在我从 map2 得到的错误是“Scale.default(.x[, -c(1:2)], .y, 500) 中的错误:‘center’的长度必须等于‘x’的列数”

However, both lists have 7 elements, I don't know quite what went wrong.但是,两个列表都有 7 个元素,我不太清楚出了什么问题。

Additional question is regarding the ~ in the .f= argument.另一个问题是关于 .f= 参数中的 ~ 。 I read it that if I have a function input, I don't need to add ~, however, in this case if I remove ~, error says x not found.我读到如果我有一个函数输入,我不需要添加 ~,但是,在这种情况下,如果我删除 ~,错误说找不到 x。 So why ~ is needed here, and shall I always put ~ in front of the function I put in map() argument?那么为什么这里需要 ~ ,我应该总是把 ~ 放在我放在 map() 参数中的函数前面吗?

You should apply scale function only to the dataframe.您应该只对数据框应用scale函数。

library(purrr)
library(FNN)

map2(.x = mylist,.y=Kmeans_centers, .f=~get.knnx(scale(.x[,-c(1:2)]),.y, 500))

~ is a formula based syntax to apply the function where the first argument is referred as .x and the second one as .y . ~是一种基于公式的语法,用于应用函数,其中第一个参数称为.x ,第二个参数称为.y It is an alternative to using an anonymous function which can be written as它是使用匿名函数的替代方法,可以写为

map2(.x = mylist,.y=Kmeans_centers, function(a, b) get.knnx(scale(a[,-c(1:2)]),b, 500))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM