如何计算不同长度的弦的频率

Question

I'm begining in R and it's a little bit hard for me sometimes. 我从R开始，有时候对我来说有点难。 I have a big data frame of 100000 observations and in this data frame I have a column id and I need to compute the most frequent id in the column. 我有一个包含100000个观测值的大数据框，并且在此数据框中有一个列ID，我需要计算该列中最频繁的ID。 But the problem is that there sometimes different id in a string separate by a '&'. 但是问题在于，有时在字符串中会有不同的ID，并用'＆'分隔。 I gonna keep an exemple it's easier: 我要举一个例子，这比较容易：

id             value
1                1
1                2
2&3&4            6
2&5&7&8          1
2&4&5            3
2                3

So, I'm supposed to obtain 2 因此，我应该获得2
There can be until 20 '&' in a string. 字符串中最多可以包含20个“＆”。

Thanks in advance, 提前致谢，

Answer 1

The data seems to be this: 数据似乎是这样的：

df<-structure(list(id = structure(c(1L, 1L, 3L, 5L, 4L, 2L), .Label = c("1", 
"2", "2&3&4", "2&4&5", "2&5&7&8"), class = "factor"), value = c(1L, 
2L, 6L, 1L, 3L, 3L)), .Names = c("id", "value"), class = "data.frame", row.names = c(NA, 
-6L))

First step is to have a vector with all the ids : 第一步是拥有一个包含所有ids的向量：

unlist(strsplit(as.character(df[,1]),'&'))
# [1] "1" "1" "2" "3" "4" "2" "5" "7" "8" "2" "4" "5" "2"

then we get the frequencies: 然后我们得到频率：

table(unlist(strsplit(as.character(df[,1]),'&')))

# 1 2 3 4 5 7 8 
# 2 4 1 2 2 1 1

and then we diplay the number with the highest frequency (the table is ordered in decreasing order): 然后我们以最高的频率显示数字（表格以降序排列）：

names(sort(table(unlist(strsplit(as.character(df[,1]),'&'))),decreasing=T)[1])
# [1] "2"

如何计算不同长度的弦的频率

问题描述

1 个解决方案

解决方案1
2 2015-12-05 13:15:00

如何计算不同长度的弦的频率

问题描述

1 个解决方案

解决方案1 2 2015-12-05 13:15:00

解决方案1
2 2015-12-05 13:15:00