[英]Select specific names from list of dataframes in R
Sample data: 样本数据:
df <- data.frame(names=letters[1:10],name1=rnorm(10,1,1),name2=rexp(10,2))
list <- list(df,df)
vec_name <- c("f","i","c") # desired row names
I would like to select per list rows given the vec_name
names: 我想给每个列表行选择vec_name
名称:
Desired outcome: 期望的结果:
[[1]]
names value1 value2
6 nd:f -1.6323952 0.3117470
9 nd:i 1.8270855 0.2475741
3 nd:c 0.6978422 0.4695581 # the ordering does matter; must be as seen in vec_name
[[2]]
names value1 value2
6 ad:f -1.6323952 0.3117470
9 ad:i 1.8270855 0.2475741
3 ad:c 0.6978422 0.4695581
Desired output 2: Is in dataframe, which would be I believe just do.call(rbind,list)
: 所需的输出2:在数据帧中,我相信这就是do.call(rbind,list)
:
However the clean names from vec_names should be used instead. 但是 ,应改用vec_names中的干净名称。
names value1 value2
1 f -1.6323952 0.3117470
2 i 1.8270855 0.2475741
3 c 0.6978422 0.4695581
4 f -1.6323952 0.3117470
5 i 1.8270855 0.2475741
6 c 0.6978422 0.4695581
I have tried sapply
; 我试过sapply
; lapply
... for example: lapply
...例如:
lapply(list, function(x) x[grepl(vec_name,x$names),])
EDIT : PLEASE SEE THE EDITED QUESTION ABOVE. 编辑 :请参阅上面的编辑问题。
You were almost there. 你快到了 The warning message was saying: 警告消息说:
Warning messages:
1: In grepl(vec_name, x$names) :
argument 'pattern' has length > 1 and only the first element will be used
Reason is that you provide a vector
to grepl
which is expecting a regex
(see ?regex
). 原因是您向grepl
提供了一个vector
,该vector
期望使用regex
(请参见?regex
)。 What you want to do is to match
the contents: 您要做的就是match
内容:
lapply(list, function(x) x[match(vec_name,x$names),])
Which will give you a list
of data.frame
objects. 这将为您提供data.frame
对象的list
。 If you want to combine them afterwards just use: 如果以后要合并它们,请使用:
do.call(rbind, lapply(list, function(x) x[match(vec_name,x$names),]))
Or you use ldply
from library(plyr)
: 或者你用ldply
从library(plyr)
library(plyr)
ldply(list, function(x) x[match(vec_name,x$names),])
# names name1 name2
# 1 f 2.01421228 0.4489627
# 2 i 0.28899891 0.8323940
# 3 c -0.01746007 1.5309936
# 4 f 2.01421228 0.4489627
# 5 i 0.28899891 0.8323940
# 6 c -0.01746007 1.5309936
And as a remark: avoid to use protected names like list
for your variables to avoid unwanted effects. 另外请注意:避免对变量使用list
等受保护的名称,以免产生不良影响。
Update 更新资料
Taking the comments into account ( vec_name
does not match completely the names in the data.frame
)you should clean first the names and then do the match
. 考虑各种意见,考虑( vec_name
不完全匹配的名称data.frame
),你应该先清理的名字,然后做match
。 This is, however, assuming that your 'uncleaned' names contain the cleaned names with a pre-fix separated by a colon (':') (if this is not the case adapt the regex
in the gsub
statement): 但是,这是假设您的“未清除的”名称包含已清除的名称,其前缀以冒号(':')分隔(如果不是这种情况,请在gsub
语句中修改regex
):
ldply(list, function(x) x[match(vec_name, gsub(".*:(.*)", "\\1", x$names)),])
for the first output : 对于第一个输出:
output1<-lapply(list,function(elt){
resmatch<-sapply(vec_name,function(x) regexpr(x,df$names))
elt<-elt[apply(resmatch,2,function(rg) which(rg>0)),]
colnames(elt)<-c("names","value1","value2")
return(elt)
})
>output1
[[1]]
names value1 value2
6 nd:f -0.2132962 0.7618105
9 nd:i -0.6580247 0.6010379
3 nd:c 0.9302625 0.1490061
[[2]]
names value1 value2
6 nd:f -0.2132962 0.7618105
9 nd:i -0.6580247 0.6010379
3 nd:c 0.9302625 0.1490061
For the second output, you can do what you wanted to : 对于第二个输出,您可以执行想要的操作:
output2<-do.call(rbind,output1)
> output2
names value1 value2
6 nd:f -0.2132962 0.7618105
9 nd:i -0.6580247 0.6010379
3 nd:c 0.9302625 0.1490061
61 nd:f -0.2132962 0.7618105
91 nd:i -0.6580247 0.6010379
31 nd:c 0.9302625 0.1490061
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.