一个单元格中的多个值

Question

I have data looking somewhat similar to this: 我的数据看起来有点类似于：

number    type    results
1         5       x, y, z
2         6       a
3         8       x
1         5       x, y

Basically, I have data in Excel that has commas in a couple of individual cells and I need to count each value that is separated by a comma, after a certain requirement is met by subsetting. 基本上，我在Excel中有数据，在几个单独的单元格中有逗号，我需要计算每个由逗号分隔的值，在通过子集化满足某个要求之后。

Question: How do I go about receiving the sum of 5 when subsetting the data with number == 1 and type == 5, in R? 问题：在R中用数字== 1和类型== 5对数据进行子集化时，如何接收5的总和？

Answer 1

If we need the total count, then another option is str_count after subsetting 如果我们需要总计数，那么在子集化之后另一个选项是str_count

library(stringr)
with(df, sum(str_count(results[number==1 & type==5], "[a-z]"), na.rm = TRUE))
#[1] 5

Or with gregexpr from base R 或者使用base R gregexpr

with(df, sum(lengths(gregexpr("[a-z]", results[number==1 & type==5])), na.rm = TRUE))
#[1] 5

If there are no matching pattern for an element, use 如果元素没有匹配的模式，请使用

with(df, sum(unlist(lapply(gregexpr("[a-z]", 
         results[number==1 & type==5]), `>`, 0)), na.rm = TRUE))

Answer 2

Here is an option using dplyr and tidyr . 这是使用dplyr和tidyr的选项。 filter function can filter the rows based on conditions. filter功能可以根据条件过滤行。 separate_rows can separate the comma. separate_rows可以分隔逗号。 group_by is to group the data. group_by用于对数据进行分组。 tally can count the numbers. tally可以计算数字。

dt2 <- dt %>%
  filter(number == 1, type == 5) %>%
  separate_rows(results) %>%
  group_by(results) %>%
  tally()
# # A tibble: 3 x 2
#   results     n
#     <chr> <int>
# 1       x     2
# 2       y     2
# 3       z     1

Or you can use count(results) only as the following code shows. 或者您只能使用count(results)如下面的代码所示。

dt2 <- dt %>%
  filter(number == 1, type == 5) %>%
  separate_rows(results) %>%
  count(results)

DATA 数据

dt <- read.table(text = "number    type    results
1         5       'x, y, z'
                 2         6       a
                 3         8       x
                 1         5       'x, y'",
                 header = TRUE, stringsAsFactors = FALSE)

Answer 3

Here is a method using base R. You split results on the commas and get the length of each list, then add these up grouping by number . 这是一个使用基数R的方法。您results在逗号上分割results并获取每个列表的长度，然后按number添加这些分组。

aggregate(sapply(strsplit(df$results, ","), length), list(df$number), sum)
  Group.1 x
1       1 5
2       2 1
3       3 1

Your data: 你的数据：

df = read.table(text="number    type    results
1         5       'x, y, z'
2         6       'a'
3         8       'x'
1         5       'x, y'",
header=TRUE, stringsAsFactors=FALSE)

一个单元格中的多个值

问题描述

3 个解决方案

解决方案1
2 2017-12-02 15:26:18

解决方案2
1 已采纳 2017-12-02 14:17:06

解决方案3
1 2017-12-02 14:37:00

一个单元格中的多个值

问题描述

3 个解决方案

解决方案1 2 2017-12-02 15:26:18

解决方案2 1 已采纳 2017-12-02 14:17:06

解决方案3 1 2017-12-02 14:37:00

解决方案1
2 2017-12-02 15:26:18

解决方案2
1 已采纳 2017-12-02 14:17:06

解决方案3
1 2017-12-02 14:37:00