R 在字符串长度不同时将字符串拆分为多列，dplyr

Question

I have animal tracking data where each animal was encountered over time and the sex was recorded at each encounter.我有动物跟踪数据，其中随着时间的推移遇到每只动物，并在每次遭遇时记录性别。 There are three types of encounters (type1, type2, and type3).遭遇分为三种类型（type1、type2 和 type3）。 Each row represents an animal and each encounter is classified as M (male) or F (female).每行代表一种动物，每次遭遇被分类为 M（雄性）或 F（雌性）。 Each character in the type represents an encounter (eg. MMMM is an animal seen four times and recorded as male each time).类型中的每个字符都代表一次遭遇（例如，MMMM 是一种动物，见过四次，每次都记录为雄性）。

Sample data:样本数据：

animal.ID    type1         type2       type3
1            MMMMMMM       M           M
2            MFMM          M           M
3            FFM           F           F
4            FFFFFFFFF     F           F  
5            MM            M           M

I want to know if the sex (male or female) was recorded consistently for each animal.我想知道每只动物的性别（男性或女性）是否一致记录。

I want to produce something like this, where a column indicates if sex was consistently recorded consistently (1) or not (0).我想制作这样的东西，其中一列指示性别是否始终如一地记录（1）或不（0）。

animal.ID    type1         type2       type3    consistent
1            MMMMMMM       M           M         1
2            MFMM          M           M         0
3            FFM           F           F         0
4            FFFFFFFFF     F           F         1
5            MM            M           M         1

I can use if_else to get the 'consistent' column for the type2 and type3 data:我可以使用 if_else 来获取 type2 和 type3 数据的“一致”列：

df %>%
   mutate(consistent = if_else(type2 == type3), 1, 0))

But, I can't include the type1 data since it has multiple characters in each string, and, different numbers of character in each string.但是，我不能包含 type1 数据，因为它在每个字符串中有多个字符，并且每个字符串中有不同数量的字符。

One approach could be to use str_split to split type1 into multiple columns, but, I don't know how to do that given the different number of characters in each string.一种方法是使用 str_split 将 type1 拆分为多个列，但是，鉴于每个字符串中的字符数不同，我不知道如何做到这一点。

Answer 1

One approach may be to use strsplit and unlist , checking that all characters are equal to type2 (in addition to checking that type2 equals type3 ).一种方法可能是使用strsplit和unlist ，检查所有字符是否等于type2 （除了检查type2是否等于type3 ）。

df %>%
  rowwise() %>%
  mutate(consistent = ifelse(type2 == type3 & all(unlist(strsplit(type1, "")) == type2), 1, 0))

Output Output

# A tibble: 5 x 5
  animal.ID type1     type2 type3 consistent
      <int> <chr>     <chr> <chr>      <dbl>
1         1 MMMMMMM   M     M              1
2         2 MFMM      M     M              0
3         3 FFM       F     F              0
4         4 FFFFFFFFF F     F              1
5         5 MM        M     M              1

Answer 2

We can use charToRaw to get the "raw" representation of type1 and assign 1 if they all are the same.我们可以使用charToRaw来获取type1的“原始”表示，如果它们都相同，则分配 1。

df$consistent <- +(sapply(df$type1, function(x) length(unique(charToRaw(x)))) ==1)

Using dplyr , we can use the same logic as:使用dplyr ，我们可以使用相同的逻辑：

library(dplyr)

df %>%
  rowwise() %>%
  mutate(consistent = +(n_distinct(charToRaw(type1)) == 1))


#  animal.ID type1     type2 type3 consistent
#      <int> <chr>     <chr> <chr>      <int>
#1         1 MMMMMMM   M     M              1
#2         2 MFMM      M     M              0
#3         3 FFM       F     F              0
#4         4 FFFFFFFFF F     F              1
#5         5 MM        M     M              1

data数据

df <- structure(list(animal.ID = 1:5, type1 = c("MMMMMMM", "MFMM", 
"FFM", "FFFFFFFFF", "MM"), type2 = c("M", "M", "F", "F", "M"), 
type3 = c("M", "M", "F", "F", "M")), class = "data.frame", row.names = c(NA, -5L))

Answer 3

Another solution using logic @Ronak Shah使用逻辑@Ronak Shah 的另一个解决方案

library(tidyverse)
df %>% 
      unite("all_type", starts_with("type"), sep = "", remove = F) %>% 
      mutate(consistent = map(strsplit(all_type, ""), ~ +(n_distinct(.x) == 1)))

R 在字符串长度不同时将字符串拆分为多列，dplyr

问题描述

3 个解决方案

解决方案1
3 已采纳 2020-06-08 00:21:03

解决方案2
1 2020-06-08 00:55:07

解决方案3
0 2020-06-08 07:55:24

R 在字符串长度不同时将字符串拆分为多列，dplyr

问题描述

3 个解决方案

解决方案1 3 已采纳 2020-06-08 00:21:03

解决方案2 1 2020-06-08 00:55:07

解决方案3 0 2020-06-08 07:55:24

解决方案1
3 已采纳 2020-06-08 00:21:03

解决方案2
1 2020-06-08 00:55:07

解决方案3
0 2020-06-08 07:55:24