[英]R split a character string into multiple columns when have different string lengths, dplyr
I have animal tracking data where each animal was encountered over time and the sex was recorded at each encounter.我有动物跟踪数据,其中随着时间的推移遇到每只动物,并在每次遭遇时记录性别。 There are three types of encounters (type1, type2, and type3).
遭遇分为三种类型(type1、type2 和 type3)。 Each row represents an animal and each encounter is classified as M (male) or F (female).
每行代表一种动物,每次遭遇被分类为 M(雄性)或 F(雌性)。 Each character in the type represents an encounter (eg. MMMM is an animal seen four times and recorded as male each time).
类型中的每个字符都代表一次遭遇(例如,MMMM 是一种动物,见过四次,每次都记录为雄性)。
Sample data:样本数据:
animal.ID type1 type2 type3
1 MMMMMMM M M
2 MFMM M M
3 FFM F F
4 FFFFFFFFF F F
5 MM M M
I want to know if the sex (male or female) was recorded consistently for each animal.我想知道每只动物的性别(男性或女性)是否一致记录。
I want to produce something like this, where a column indicates if sex was consistently recorded consistently (1) or not (0).我想制作这样的东西,其中一列指示性别是否始终如一地记录(1)或不(0)。
animal.ID type1 type2 type3 consistent
1 MMMMMMM M M 1
2 MFMM M M 0
3 FFM F F 0
4 FFFFFFFFF F F 1
5 MM M M 1
I can use if_else to get the 'consistent' column for the type2 and type3 data:我可以使用 if_else 来获取 type2 和 type3 数据的“一致”列:
df %>%
mutate(consistent = if_else(type2 == type3), 1, 0))
But, I can't include the type1 data since it has multiple characters in each string, and, different numbers of character in each string.但是,我不能包含 type1 数据,因为它在每个字符串中有多个字符,并且每个字符串中有不同数量的字符。
One approach could be to use str_split to split type1 into multiple columns, but, I don't know how to do that given the different number of characters in each string.一种方法是使用 str_split 将 type1 拆分为多个列,但是,鉴于每个字符串中的字符数不同,我不知道如何做到这一点。
One approach may be to use strsplit
and unlist
, checking that all characters are equal to type2
(in addition to checking that type2
equals type3
).一种方法可能是使用
strsplit
和unlist
,检查所有字符是否等于type2
(除了检查type2
是否等于type3
)。
df %>%
rowwise() %>%
mutate(consistent = ifelse(type2 == type3 & all(unlist(strsplit(type1, "")) == type2), 1, 0))
Output Output
# A tibble: 5 x 5
animal.ID type1 type2 type3 consistent
<int> <chr> <chr> <chr> <dbl>
1 1 MMMMMMM M M 1
2 2 MFMM M M 0
3 3 FFM F F 0
4 4 FFFFFFFFF F F 1
5 5 MM M M 1
We can use charToRaw
to get the "raw" representation of type1
and assign 1 if they all are the same.我们可以使用
charToRaw
来获取type1
的“原始”表示,如果它们都相同,则分配 1。
df$consistent <- +(sapply(df$type1, function(x) length(unique(charToRaw(x)))) ==1)
Using dplyr
, we can use the same logic as:使用
dplyr
,我们可以使用相同的逻辑:
library(dplyr)
df %>%
rowwise() %>%
mutate(consistent = +(n_distinct(charToRaw(type1)) == 1))
# animal.ID type1 type2 type3 consistent
# <int> <chr> <chr> <chr> <int>
#1 1 MMMMMMM M M 1
#2 2 MFMM M M 0
#3 3 FFM F F 0
#4 4 FFFFFFFFF F F 1
#5 5 MM M M 1
data数据
df <- structure(list(animal.ID = 1:5, type1 = c("MMMMMMM", "MFMM",
"FFM", "FFFFFFFFF", "MM"), type2 = c("M", "M", "F", "F", "M"),
type3 = c("M", "M", "F", "F", "M")), class = "data.frame", row.names = c(NA, -5L))
Another solution using logic @Ronak Shah使用逻辑@Ronak Shah 的另一个解决方案
library(tidyverse)
df %>%
unite("all_type", starts_with("type"), sep = "", remove = F) %>%
mutate(consistent = map(strsplit(all_type, ""), ~ +(n_distinct(.x) == 1)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.