仅查找一次包含某个字符的数据框行

Question

Sorry for potential duplicating, but I don't really know how to formulate my request.抱歉可能会重复，但我真的不知道如何提出我的要求。 I work on R and I would like to be able to identify data frame cells that contain a certain character only one time.我在 R 上工作，我希望能够仅一次识别包含某个字符的数据框单元格。

In my df I have a column a that contains formulas stored as strings, eg在我的df我有一列a包含存储为字符串的公式，例如

# a
1 y~x1+x2
2 y~x2+x3
3 y~x1+x2+x3
4 y~x2+x4
5 y~x1+x3+x4

and I would like to keep rows which formulas in column a have 2 explanatory variables, ie that only contain one "+".我想保留a列中a公式有2个解释变量的行，即只包含一个“+”。 The idea would be to filter and to add kind of a dummy, such as the output would be like这个想法是过滤并添加一种假人，例如 output 就像

# ab
1 y~x1+x2 1
2 y~x2+x3 1
3 y~x1+x2+x3 0
4 y~x2+x4 1
5 y~x1+x3+x4 0

Hope that's clear enough.希望这足够清楚。 Thanks for helping,感谢您的帮助，
Val瓦尔

Answer 1

You can use gsub with [^+] to extract all + and nchar to get their number.您可以使用带有[^+]的gsub来提取所有+和nchar以获取它们的编号。

x$b <- +(nchar(gsub("[^+]", "", x$a)) == 1)
x
#           a b
#1    y~x1+x2 1
#2    y~x2+x3 1
#3 y~x1+x2+x3 0
#4    y~x2+x4 1
#5 y~x1+x3+x4 0

Or use gregexpr :或使用gregexpr ：

lapply(gregexpr("\\+", x$a), length) == 1
#[1]  TRUE  TRUE FALSE  TRUE FALSE

Or using it with lengths as suggested by @ThomasIsCoding:或者按照@ThomasIsCoding 的建议使用lengths ：

lengths(gregexpr("\\+", x$a)) == 1
#[1]  TRUE  TRUE FALSE  TRUE FALSE

Or using grepl :或使用grepl ：

grepl("^[^+]*\\+[^+]*$", x$a)
#[1]  TRUE  TRUE FALSE  TRUE FALSE

Or with strsplit :或使用strsplit ：

sapply(strsplit(x$a, ""), function(y) sum(y == "+")==1)
#[1]  TRUE  TRUE FALSE  TRUE FALSE

Data:数据：

x <- read.table(header=TRUE, text="a
1  y~x1+x2
2  y~x2+x3
3  y~x1+x2+x3
4  y~x2+x4
5  y~x1+x3+x4", stringsAsFactors = FALSE)

Answer 2

Another base R solution is using gregexpr , ie,另一个基本 R 解决方案是使用gregexpr ，即

df$b <- +(lengths(gregexpr("\\+",df$a))==1)

such that这样

> df
           a b
1    y~x1+x2 1
2    y~x2+x3 1
3 y~x1+x2+x3 0
4    y~x2+x4 1
5 y~x1+x3+x4 0

DATA数据

df <- structure(list(a = c("y~x1+x2", "y~x2+x3", "y~x1+x2+x3", "y~x2+x4", 
"y~x1+x3+x4")), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5"))

Answer 3

A third base alternative assuming there is always at least two predictors in the formula.假设公式中始终存在至少两个预测变量的第三种基本替代方案。

df$b <- +(!grepl("\\+.*\\+", df$a))

df
           a b
1    y~x1+x2 1
2    y~x2+x3 1
3 y~x1+x2+x3 0
4    y~x2+x4 1
5 y~x1+x3+x4 0

仅查找一次包含某个字符的数据框行

问题描述

3 个解决方案

解决方案1
3 2020-04-15 15:14:57

解决方案2
1 2020-04-15 15:26:51

解决方案3
1 2020-04-15 15:32:22

仅查找一次包含某个字符的数据框行

问题描述

3 个解决方案

解决方案1 3 2020-04-15 15:14:57

解决方案2 1 2020-04-15 15:26:51

解决方案3 1 2020-04-15 15:32:22

解决方案1
3 2020-04-15 15:14:57

解决方案2
1 2020-04-15 15:26:51

解决方案3
1 2020-04-15 15:32:22