如何计算特定列中满足条件的出现次数（4 个字母）

Question

x = c(1,2,3,4,5)
y = c("AA","BB","CC", "AAAA","BBBB")
data1 = data.frame(x,y)
data1

^^I want the output to be the number of time the 4 letters occur in the y column. ^^我希望输出是 y 列中 4 个字母出现的次数。 Desired output would be 2期望的输出是 2

I want to count the number of times a 4 letter factor observations occurs in a given column in a dataframe.我想计算数据框中给定列中出现 4 个字母因子观测值的次数。 How do I do this?我该怎么做呢？

Answer 1

If you only want to extract and count factor values that have exactly 4 letters ( any letters, not necessarily the same ), then you can do this:如果您只想提取和计算恰好有 4 个字母（任何字母，不一定相同）的因子值，那么您可以这样做：

Step 1--Define a pattern to match:步骤 1--定义要匹配的模式：

pattern <- "\\w{4}"

Step 2--Define a function to extract only the raw matches:第 2 步——定义一个函数来只提取原始匹配：

extract <- function(x) unlist(regmatches(x, gregexpr(pattern, x, perl = T)))

Step 3--Apply the function to the data of interest:第 3 步——将函数应用于感兴趣的数据：

extract(data1$y)

And that's the result:这就是结果：

[1] "AAAA" "BBBB"

Step 4--To count the number of matches you can use length :第 4 步 - 要计算您可以使用的匹配数量length ：

length(extract(data1$y))
[1] 2

EDIT : Alternatively you can use str_extract from the package stringr :编辑：您也可以使用str_extract从包装stringr ：

STEP 1: store the result in a vector extr :步骤1：将结果存储在一个矢量extr ：

extr <- str_extract(data1$y, "\\w{4}")

STEP 2: using length , the negation operator !第 2 步：使用length ，否定运算符! and is.na , a function that tests for NA and evaluates to TRUE and FALSE, you can count the number of times that test evaluates to FALSE:和is.na ，一个测试 NA 并评估为 TRUE 和 FALSE 的函数，您可以计算测试评估为 FALSE 的次数：

length(extr[!is.na(extr)])
[1] 2

Answer 2

Maybe you can try nchar if you have strings in column y always consisting of letters如果y列中的字符串总是由字母组成，也许您可以尝试nchar

sum(nchar(as.vector(data1$y))==4)

# > sum(nchar(as.vector(data1$y))==4)
#   2

如何计算特定列中满足条件的出现次数（4 个字母）

问题描述

2 个解决方案

解决方案1
0 2020-02-06 20:43:41

解决方案2
0 2020-02-06 21:58:25

如何计算特定列中满足条件的出现次数（4 个字母）

问题描述

2 个解决方案

解决方案1 0 2020-02-06 20:43:41

解决方案2 0 2020-02-06 21:58:25

解决方案1
0 2020-02-06 20:43:41

解决方案2
0 2020-02-06 21:58:25