简体   繁体   English

在dplyr :: mutate(没有tibble :: data_frame)中使用strsplit会引发“评估错误:非字符参数”

[英]Using strsplit within dplyr::mutate (without tibble::data_frame) raises “Evaluation error: non-character argument”

Edit: there was a typo in my df creation, with a missing _ on the last value of MediaName ; 编辑:我的df创建中有一个拼写错误, MediaName的最后一个值缺少_ ; this is now corrected. 现在已经纠正了。

I want to create a new variable TrialId in a data frame as part of the value of another variable MediaName depending on the value of a third variable Phase , and thought I could do that using strsplit and ifelse within a dplyr::mutate as follows: 我想在数据框中创建一个新的变量TrialId作为另一个变量MediaName的值的一部分,取决于第三个变量Phase的值,并且我认为我可以在dplyr::mutate中使用strsplitifelse ,如下所示:

library(dplyr)

# Creating a simple data frame for the example
df <- data.frame(Phase = c(rep("Familiarisation",8),rep("Test",3)),
                 MediaName = c("Flip_A1_G1","Reg_B2_S1","Reg_A2_G1","Flip_B1_S1",
                               "Reg_A1_G2","Flip_B2_S2","Reg_A2_G2","Flip_B1_S2",
                               "HC_A1L","TC_B1R","RC_BL_2R"))

# Creating a new column
df <- df %>%
  mutate(TrialId = ifelse(Phase == "Familiarisation",
                          sapply(strsplit(MediaName, "_"), "[", 2),
                          sapply(strsplit(MediaName, "_"), "[", 1)))

The expected result being 预期的结果是

> df$TrialId
[1] "A1" "B2" "A2" "B1" "A1" "B2" "A2" "B1" "HC" "TC" "RC"

However this gives me the following error because, I believe, of the strsplit : 然而,这给了我以下错误,因为,我相信, strsplit

Error in mutate_impl(.data, dots) : 
  Evaluation error: non-character argument.

I know from this SO question that I can easily solve my issue by defining, in this small example, my data frame as a tibble::data_frame , without knowing why this solves the issue. 我从这个问题中得知,我可以通过在这个小例子中将我的数据框定义为tibble::data_frame来轻松解决我的问题,而不知道为什么这会解决问题。 I can't do exactly that though as in my actual code df comes from reading a csv file (with read.csv() ). 我无法做到这一点虽然在我的实际代码中df来自于读取csv文件(使用read.csv() )。 I have been thinking that using df <- df %>% as_tibble() %>% mutate(...) would solve the issue in a similar way, but it doesn't (why?). 我一直在想使用df <- df %>% as_tibble() %>% mutate(...)会以类似的方式解决问题,但它没有(为什么?)。

Is there a way to actually use tibble even when reading files? 有没有办法在读取文件时实际使用tibble Or is there another way of achieving what I need to do, without using strsplit maybe? 或者是否有其他方法可以实现我需要做的事情,而不使用strsplit

I'm also reading on this other SO question that you can use tidyr::separate but it isn't doing exactly what I want as I need to keep either the first or second value depending on the value of Phase . 我也正在阅读另一个问题 ,你可以使用tidyr::separate但是它并没有完全按照我的要求进行操作,因为我需要保留第一个或第二个值,具体取决于Phase的值。

You can try: 你可以试试:

library(tidyverse)
# your first data 
df_old <- data.frame(Phase = c(rep("Familiarisation",8),rep("Test",3)),
                 MediaName = c("Flip_A1_G1","Reg_B2_S1","Reg_A2_G1","Flip_B1_S1",
                               "Reg_A1_G2","Flip_B2_S2","Reg_A2_G2","Flip_B1_S2",
                               "HC_A1L","TC_B1R","RC_BL2R"))
df_old %>% 
  separate(MediaName, into=letters[1:3], sep="_", fill = "left", remove = FALSE) %>% 
  select(Phase, MediaName, TrialId=b)
             Phase  MediaName TrialId
1  Familiarisation Flip_A1_G1      A1
2  Familiarisation  Reg_B2_S1      B2
3  Familiarisation  Reg_A2_G1      A2
4  Familiarisation Flip_B1_S1      B1
5  Familiarisation  Reg_A1_G2      A1
6  Familiarisation Flip_B2_S2      B2
7  Familiarisation  Reg_A2_G2      A2
8  Familiarisation Flip_B1_S2      B1
9             Test     HC_A1L      HC
10            Test     TC_B1R      TC
11            Test    RC_BL2R      RC

It is a hardcoded solution according the provided sample data. 根据提供的样本数据,它是一种硬编码的解决方案。 Separate by "_" , if there are onyl two instead of three "_" fill NA s from the left side. "_"分隔,如果有两个而不是三个"_"则从左侧填充NA Finally, select the columns you need. 最后,选择您需要的列。

Edit 编辑

With your new data it is somewhat more complicated. 使用您的新数据会更复杂一些。 but you can try: 但你可以尝试:

df %>% 
  add_column(MediaName_keep=df$MediaName) %>% 
  group_by(MediaName_keep) %>% 
  separate_rows(MediaName, sep="_") %>% 
  mutate(n=1:n()) %>% 
  filter((Phase == "Familiarisation" & n == 2) | (Phase == "Test" & n == 1)) %>% 
  select(Phase, MediaName=MediaName_keep, TrialId=MediaName)
# A tibble: 11 x 3
# Groups:   MediaName [11]
             Phase  MediaName TrialId
            <fctr>     <fctr>   <chr>
 1 Familiarisation Flip_A1_G1      A1
 2 Familiarisation  Reg_B2_S1      B2
 3 Familiarisation  Reg_A2_G1      A2
 4 Familiarisation Flip_B1_S1      B1
 5 Familiarisation  Reg_A1_G2      A1
 6 Familiarisation Flip_B2_S2      B2
 7 Familiarisation  Reg_A2_G2      A2
 8 Familiarisation Flip_B1_S2      B1
 9            Test     HC_A1L      HC
10            Test     TC_B1R      TC
11            Test   RC_BL_2R      RC

The idea is the same. 这个想法是一样的。 Separate, but at this time add and count the new rows by MediaName_keep , then filter according your needs. 单独,但此时按MediaName_keep添加和计算新行,然后根据您的需要进行过滤。

The problem you encountered is because the string was automatically converted in a factor , therefore you cannot apply strsplit() to a non-string object. 遇到的问题是因为字符串是在factor自动转换的,因此您无法将strsplit()应用于非字符串对象。 My solution simply convert the MediaName into a string type. 我的解决方案只是将MediaName转换为string类型。

require(dplyr)    
df <- df %>%
        dplyr::mutate(MediaName = as.character(levels(df$MediaName))[df$MediaName]) %>%
                dplyr::mutate(TrialId = ifelse(Phase == "Familiarisation",
                                        sapply(strsplit(MediaName, "_"), "[", 2),
                                        sapply(strsplit(MediaName, "_"), "[", 1))) 





solution<- c("A1", "B2", "A2", "B1", "A1", "B2", "A2", "B1", "HC", "TC", "RC")
identical(solution, df$TrialId)
[1] TRUE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用非字符数据拆分 - strsplit with non-character data strsplit(filename,“ \\\\。”)中的错误:ggplot中的非字符参数 - Error in strsplit(filename, “\\.”) : non-character argument in ggplot strsplit(word,NULL)中的错误:带拼写检查器的非字符参数 - Error in strsplit(word, NULL) : non-character argument with spell checker strsplit(M[, Field], sep) 中的错误:使用 Bibliometrix 函数 NetMatrix 和 cocMatrix 时的非字符参数? - Error in strsplit(M[, Field], sep) : non-character argument when using the Bibliometrix functions NetMatrix and cocMatrix? Tibble / Data_frame不使用mutate_if()进行四舍五入 - Tibble/Data_frame is not rounding using mutate_if() 使用data_frame作为mutate和group_by例程的参数 - Using a data_frame as an argument into a mutate and group_by routine 如果在 function 中调用 R 代码中的错误(多完成器,strsplit:非字符参数) - Error in R Code if called in a function (multcompleters, strsplit: non-character argument) 使用 dplyr 时出错:object 'data_frame' 不是由 'namespace:vctrs' 导出的 - Error using dplyr : object ‘data_frame’ is not exported by 'namespace:vctrs' 在 dplyr 和 mutate 中使用 strsplit 和子集 - using strsplit and subset in dplyr and mutate 在 R 中更新 dplyr 会出现“没有名为‘data_frame’的包”错误 - Updating dplyr in R gives “no package called ‘data_frame’” error
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM