简体   繁体   English

R 在字符串长度不同时将字符串拆分为多列,dplyr

[英]R split a character string into multiple columns when have different string lengths, dplyr

I have animal tracking data where each animal was encountered over time and the sex was recorded at each encounter.我有动物跟踪数据,其中随着时间的推移遇到每只动物,并在每次遭遇时记录性别。 There are three types of encounters (type1, type2, and type3).遭遇分为三种类型(type1、type2 和 type3)。 Each row represents an animal and each encounter is classified as M (male) or F (female).每行代表一种动物,每次遭遇被分类为 M(雄性)或 F(雌性)。 Each character in the type represents an encounter (eg. MMMM is an animal seen four times and recorded as male each time).类型中的每个字符都代表一次遭遇(例如,MMMM 是一种动物,见过四次,每次都记录为雄性)。

Sample data:样本数据:

animal.ID    type1         type2       type3
1            MMMMMMM       M           M
2            MFMM          M           M
3            FFM           F           F
4            FFFFFFFFF     F           F  
5            MM            M           M

I want to know if the sex (male or female) was recorded consistently for each animal.我想知道每只动物的性别(男性或女性)是否一致记录。

I want to produce something like this, where a column indicates if sex was consistently recorded consistently (1) or not (0).我想制作这样的东西,其中一列指示性别是否始终如一地记录(1)或不(0)。

animal.ID    type1         type2       type3    consistent
1            MMMMMMM       M           M         1
2            MFMM          M           M         0
3            FFM           F           F         0
4            FFFFFFFFF     F           F         1
5            MM            M           M         1

I can use if_else to get the 'consistent' column for the type2 and type3 data:我可以使用 if_else 来获取 type2 和 type3 数据的“一致”列:

df %>%
   mutate(consistent = if_else(type2 == type3), 1, 0))

But, I can't include the type1 data since it has multiple characters in each string, and, different numbers of character in each string.但是,我不能包含 type1 数据,因为它在每个字符串中有多个字符,并且每个字符串中有不同数量的字符。

One approach could be to use str_split to split type1 into multiple columns, but, I don't know how to do that given the different number of characters in each string.一种方法是使用 str_split 将 type1 拆分为多个列,但是,鉴于每个字符串中的字符数不同,我不知道如何做到这一点。

One approach may be to use strsplit and unlist , checking that all characters are equal to type2 (in addition to checking that type2 equals type3 ).一种方法可能是使用strsplitunlist ,检查所有字符是否等于type2 (除了检查type2是否等于type3 )。

df %>%
  rowwise() %>%
  mutate(consistent = ifelse(type2 == type3 & all(unlist(strsplit(type1, "")) == type2), 1, 0))

Output Output

# A tibble: 5 x 5
  animal.ID type1     type2 type3 consistent
      <int> <chr>     <chr> <chr>      <dbl>
1         1 MMMMMMM   M     M              1
2         2 MFMM      M     M              0
3         3 FFM       F     F              0
4         4 FFFFFFFFF F     F              1
5         5 MM        M     M              1

We can use charToRaw to get the "raw" representation of type1 and assign 1 if they all are the same.我们可以使用charToRaw来获取type1的“原始”表示,如果它们都相同,则分配 1。

df$consistent <- +(sapply(df$type1, function(x) length(unique(charToRaw(x)))) ==1)

Using dplyr , we can use the same logic as:使用dplyr ,我们可以使用相同的逻辑:

library(dplyr)

df %>%
  rowwise() %>%
  mutate(consistent = +(n_distinct(charToRaw(type1)) == 1))


#  animal.ID type1     type2 type3 consistent
#      <int> <chr>     <chr> <chr>      <int>
#1         1 MMMMMMM   M     M              1
#2         2 MFMM      M     M              0
#3         3 FFM       F     F              0
#4         4 FFFFFFFFF F     F              1
#5         5 MM        M     M              1

data数据

df <- structure(list(animal.ID = 1:5, type1 = c("MMMMMMM", "MFMM", 
"FFM", "FFFFFFFFF", "MM"), type2 = c("M", "M", "F", "F", "M"), 
type3 = c("M", "M", "F", "F", "M")), class = "data.frame", row.names = c(NA, -5L))

Another solution using logic @Ronak Shah使用逻辑@Ronak Shah 的另一个解决方案

library(tidyverse)
df %>% 
      unite("all_type", starts_with("type"), sep = "", remove = F) %>% 
      mutate(consistent = map(strsplit(all_type, ""), ~ +(n_distinct(.x) == 1)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM