[英]Pass variable as column name to dplyr?
I have a very ugly dataset that is a flat file of a relational database. 我有一个非常难看的数据集,它是关系数据库的平面文件。 A minimal reproducible example is here:
这里是一个可重复性最小的例子:
df <- data.frame(col1 = c(letters[1:4],"c"),
col1.p = 1:5,
col2 = c("a","c","l","c","l"),
col2.p = 6:10,
col3= letters[3:7],
col3.p = 11:20)
I need to be able to identify the '.p' value for the 'col#' that has the "c". 我需要能够识别具有“c”的'col#'的'.p'值。 My previous question on SO got the first part: In R, find the column that contains a string in for each row .
我之前关于SO的问题得到了第一部分: 在R中,找到每行包含字符串的列 。 Which I'm providing for context.
我正在提供上下文。
tmp <- which(projectdata=='Transmission and Distribution of Electricity', arr.ind=TRUE)
cnt <- ave(tmp[,"row"], tmp[,"row"], FUN=seq_along)
maxnames <- paste0("max",sequence(max(cnt)))
projectdata[maxnames] <- NA
projectdata[maxnames][cbind(tmp[,"row"],cnt)] <- names(projectdata)[tmp[,"col"]]
rm(tmp, cnt, maxnames)
This results in a dataframe that looks like this: 这会产生如下所示的数据框:
df
col1 col1.p col2 col2.p col3 col3.p max1
1 a 1 a 6 c 11 col3
2 b 2 c 7 d 12 col2
3 c 3 l 8 e 13 col1
4 d 4 c 9 f 14 col2
5 c 5 l 10 g 15 col1
6 a 1 a 6 c 16 col3
7 b 2 c 7 d 17 col2
8 c 3 l 8 e 18 col1
9 d 4 c 9 f 19 col2
10 c 5 l 10 g 20 col1
When I tried to get the ".p" that matched the value in "max1", I kept getting errors. 当我试图获得与“max1”中的值匹配的“.p”时,我不断收到错误。 I thought the approach would be:
我认为方法是:
df %>%
mutate(my.p = eval(as.name(paste0(max1,'.p'))))
Error: object 'col3.p' not found
Clearly, this did not work, so I thought maybe this was similar to passing a column name in a function, where I need to use 'get'. 显然,这不起作用,所以我想也许这类似于在函数中传递列名,我需要使用'get'。 That also didn't work.
这也行不通。
df %>%
mutate(my.p = get(as.name(paste0(max1,'.p'))))
Error: invalid first argument
df %>%
mutate(my.p = get(paste0(max1,'.p')))
Error: object 'col3.p' not found
I found something that gets rid of this error, using data.table
from a different, but related problem, here: http://codereply.com/answer/7y2ra3/dplyr-error-object-found-using-rle-mutate.html . 我找到了一些可以摆脱这个错误的东西,使用来自不同但相关问题的
data.table
,这里: http : data.table
。 HTML 。 However, it gives me "col3.p" for every row. 但是,它为每行提供了“col3.p”。 This is max1 for the first row,
df$max1[1]
这是第一行的
df$max1[1]
, df$max1[1]
library('dplyr')
library('data.table') # must have the data.table package
df %>%
tbl_dt(df) %>%
mutate(my.p = get(paste0(max1,'.p')))
Source: local data table [10 x 8]
col1 col1.p col2 col2.p col3 col3.p max1 my.p
1 a 1 a 6 c 11 col3 11
2 b 2 c 7 d 12 col2 12
3 c 3 l 8 e 13 col1 13
4 d 4 c 9 f 14 col2 14
5 c 5 l 10 g 15 col1 15
6 a 1 a 6 c 16 col3 16
7 b 2 c 7 d 17 col2 17
8 c 3 l 8 e 18 col1 18
9 d 4 c 9 f 19 col2 19
10 c 5 l 10 g 20 col1 20
Using the lazyeval
interp
approach (from this SO: Hot to pass dynamic column names in dplyr into custom function? ) doesn't work for me. 使用
lazyeval
interp
方法(从这个SO: ?热到dplyr动态列名传递到自定义函数 )不会为我工作。 Perhaps I am implementing it incorrectly? 也许我正在错误地实施它?
library(lazyeval)
library(dplyr)
df %>%
mutate_(my.p = interp(~colp, colp = as.name(paste0(max1,'.p'))))
I get an error: 我收到一个错误:
Error in paste0(max1, ".p") : object 'max1' not found
Ideally, I will have the new column my.p
equal the appropriate p
based on the column identified in max1
. 理想情况下,我都会有新列
my.p
等于相应p
基础上标识的列max1
。
I can do this all with ifelse
, but I am trying to do it with less code and to make it applicable to the next ugly flat table. 我可以使用
ifelse
完成所有这些ifelse
,但我尝试使用更少的代码并使其适用于下一个丑陋的平面表。
We can do this with data.table
. 我们可以使用
data.table
来做到这data.table
。 We convert the 'data.frame' to 'data.table' ( setDT(df)
), grouped by the the row sequence, we get
the value of the paste
output, and assign ( :=
) it to a new column ('my.p'). 我们将'data.frame'转换为'data.table'(
setDT(df)
),按行序列分组, get
paste
输出的值,并将( :=
)赋值给新列(' my.p')。
library(data.table)
setDT(df)[, my.p:= get(paste0(max1, '.p')), 1:nrow(df)]
df
# col1 col1.p col2 col2.p col3 col3.p max1 my.p
# 1: a 1 a 6 c 11 col3 11
# 2: b 2 c 7 d 12 col2 7
# 3: c 3 l 8 e 13 col1 3
# 4: d 4 c 9 f 14 col2 9
# 5: c 5 l 10 g 15 col1 5
# 6: a 1 a 6 c 16 col3 16
# 7: b 2 c 7 d 17 col2 7
# 8: c 3 l 8 e 18 col1 3
# 9: d 4 c 9 f 19 col2 9
#10: c 5 l 10 g 20 col1 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.