[英]Using strsplit within dplyr::mutate (without tibble::data_frame) raises “Evaluation error: non-character argument”
Edit: there was a typo in my df
creation, with a missing _
on the last value of MediaName
; 编辑:我的
df
创建中有一个拼写错误, MediaName
的最后一个值缺少_
; this is now corrected. 现在已经纠正了。
I want to create a new variable TrialId
in a data frame as part of the value of another variable MediaName
depending on the value of a third variable Phase
, and thought I could do that using strsplit
and ifelse
within a dplyr::mutate
as follows: 我想在数据框中创建一个新的变量
TrialId
作为另一个变量MediaName
的值的一部分,取决于第三个变量Phase
的值,并且我认为我可以在dplyr::mutate
中使用strsplit
和ifelse
,如下所示:
library(dplyr)
# Creating a simple data frame for the example
df <- data.frame(Phase = c(rep("Familiarisation",8),rep("Test",3)),
MediaName = c("Flip_A1_G1","Reg_B2_S1","Reg_A2_G1","Flip_B1_S1",
"Reg_A1_G2","Flip_B2_S2","Reg_A2_G2","Flip_B1_S2",
"HC_A1L","TC_B1R","RC_BL_2R"))
# Creating a new column
df <- df %>%
mutate(TrialId = ifelse(Phase == "Familiarisation",
sapply(strsplit(MediaName, "_"), "[", 2),
sapply(strsplit(MediaName, "_"), "[", 1)))
The expected result being 预期的结果是
> df$TrialId
[1] "A1" "B2" "A2" "B1" "A1" "B2" "A2" "B1" "HC" "TC" "RC"
However this gives me the following error because, I believe, of the strsplit
: 然而,这给了我以下错误,因为,我相信,
strsplit
:
Error in mutate_impl(.data, dots) :
Evaluation error: non-character argument.
I know from this SO question that I can easily solve my issue by defining, in this small example, my data frame as a tibble::data_frame
, without knowing why this solves the issue. 我从这个问题中得知,我可以通过在这个小例子中将我的数据框定义为
tibble::data_frame
来轻松解决我的问题,而不知道为什么这会解决问题。 I can't do exactly that though as in my actual code df
comes from reading a csv file (with read.csv()
). 我无法做到这一点虽然在我的实际代码中
df
来自于读取csv文件(使用read.csv()
)。 I have been thinking that using df <- df %>% as_tibble() %>% mutate(...)
would solve the issue in a similar way, but it doesn't (why?). 我一直在想使用
df <- df %>% as_tibble() %>% mutate(...)
会以类似的方式解决问题,但它没有(为什么?)。
Is there a way to actually use tibble
even when reading files? 有没有办法在读取文件时实际使用
tibble
? Or is there another way of achieving what I need to do, without using strsplit
maybe? 或者是否有其他方法可以实现我需要做的事情,而不使用
strsplit
?
I'm also reading on this other SO question that you can use tidyr::separate
but it isn't doing exactly what I want as I need to keep either the first or second value depending on the value of Phase
. 我也正在阅读另一个问题 ,你可以使用
tidyr::separate
但是它并没有完全按照我的要求进行操作,因为我需要保留第一个或第二个值,具体取决于Phase
的值。
You can try: 你可以试试:
library(tidyverse)
# your first data
df_old <- data.frame(Phase = c(rep("Familiarisation",8),rep("Test",3)),
MediaName = c("Flip_A1_G1","Reg_B2_S1","Reg_A2_G1","Flip_B1_S1",
"Reg_A1_G2","Flip_B2_S2","Reg_A2_G2","Flip_B1_S2",
"HC_A1L","TC_B1R","RC_BL2R"))
df_old %>%
separate(MediaName, into=letters[1:3], sep="_", fill = "left", remove = FALSE) %>%
select(Phase, MediaName, TrialId=b)
Phase MediaName TrialId
1 Familiarisation Flip_A1_G1 A1
2 Familiarisation Reg_B2_S1 B2
3 Familiarisation Reg_A2_G1 A2
4 Familiarisation Flip_B1_S1 B1
5 Familiarisation Reg_A1_G2 A1
6 Familiarisation Flip_B2_S2 B2
7 Familiarisation Reg_A2_G2 A2
8 Familiarisation Flip_B1_S2 B1
9 Test HC_A1L HC
10 Test TC_B1R TC
11 Test RC_BL2R RC
It is a hardcoded solution according the provided sample data. 根据提供的样本数据,它是一种硬编码的解决方案。 Separate by
"_"
, if there are onyl two instead of three "_"
fill NA
s from the left side. 用
"_"
分隔,如果有两个而不是三个"_"
则从左侧填充NA
。 Finally, select the columns you need. 最后,选择您需要的列。
With your new data it is somewhat more complicated. 使用您的新数据会更复杂一些。 but you can try:
但你可以尝试:
df %>%
add_column(MediaName_keep=df$MediaName) %>%
group_by(MediaName_keep) %>%
separate_rows(MediaName, sep="_") %>%
mutate(n=1:n()) %>%
filter((Phase == "Familiarisation" & n == 2) | (Phase == "Test" & n == 1)) %>%
select(Phase, MediaName=MediaName_keep, TrialId=MediaName)
# A tibble: 11 x 3
# Groups: MediaName [11]
Phase MediaName TrialId
<fctr> <fctr> <chr>
1 Familiarisation Flip_A1_G1 A1
2 Familiarisation Reg_B2_S1 B2
3 Familiarisation Reg_A2_G1 A2
4 Familiarisation Flip_B1_S1 B1
5 Familiarisation Reg_A1_G2 A1
6 Familiarisation Flip_B2_S2 B2
7 Familiarisation Reg_A2_G2 A2
8 Familiarisation Flip_B1_S2 B1
9 Test HC_A1L HC
10 Test TC_B1R TC
11 Test RC_BL_2R RC
The idea is the same. 这个想法是一样的。 Separate, but at this time add and count the new rows by
MediaName_keep
, then filter according your needs. 单独,但此时按
MediaName_keep
添加和计算新行,然后根据您的需要进行过滤。
The problem you encountered is because the string was automatically converted in a factor
, therefore you cannot apply strsplit()
to a non-string object. 遇到的问题是因为字符串是在
factor
自动转换的,因此您无法将strsplit()
应用于非字符串对象。 My solution simply convert the MediaName
into a string
type. 我的解决方案只是将
MediaName
转换为string
类型。
require(dplyr)
df <- df %>%
dplyr::mutate(MediaName = as.character(levels(df$MediaName))[df$MediaName]) %>%
dplyr::mutate(TrialId = ifelse(Phase == "Familiarisation",
sapply(strsplit(MediaName, "_"), "[", 2),
sapply(strsplit(MediaName, "_"), "[", 1)))
solution<- c("A1", "B2", "A2", "B1", "A1", "B2", "A2", "B1", "HC", "TC", "RC")
identical(solution, df$TrialId)
[1] TRUE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.