如何计算字符串中的字母并返回 R 中数据框中行的最高出现字母

Question

我在数据框中有一列，其中包含描述风向的字母。 我需要为每一行找到最常见的方向，这将涉及计算每个字母的出现次数，然后选择最常见的字母。 这是数据框的示例：

structure(list(Day = c("15", "16", "17", "18", "19", "20"), Month = structure(c(4L, 
4L, 4L, 4L, 4L, 4L), .Label = c("Dec", "Nov", "Oct", "Sep"), class = "factor"), 
    Year = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2012", 
    "2013", "2014", "2015", "2018", "2019", "2020"), class = "factor"), 
    Time = structure(c(10L, 10L, 10L, 10L, 10L, 10L), .Label = c("1-2pm", 
    "10-11am", "11-12am", "12-1pm", "2-3pm", "3-4pm", "4-5pm", 
    "5-6pm", "7-8am", "8-9am", "9-10am"), class = "factor"), 
    Direction_Abrev = c("S-SE", "S-SE", "SW-S", "W-SE", "W-SW", 
    "SW-S")), row.names = c(NA, 6L), class = "data.frame")

我希望生成的数据框如下所示：

  Day Month Year  Time Direction_Abrev
1  15   Sep 2013 8-9am              S
2  16   Sep 2013 8-9am              S
3  17   Sep 2013 8-9am              S
4  18   Sep 2013 8-9am           W-SE
5  19   Sep 2013 8-9am              W
6  20   Sep 2013 8-9am              S

返回最常见的字母。 有一个问题（如第 4 行），所有字母都同样常见。 在这些情况下，如果可能的话，我想返回原始值。 提前致谢。

Answer 1

sapply(dat$Direction_Abrev, function(s) {
  counts <- sort(table(setdiff(strsplit(s, ""), "-")), decreasing = TRUE)
  if (length(counts) < 2 || counts[1] == counts[2]) s else names(counts)[1]
})
#   S-SE   S-SE   SW-S   W-SE   W-SW   SW-S 
#    "S"    "S"    "S" "W-SE"    "W"    "S"

Answer 2

这是使用strsplit + intersect的基本 R 选项

transform(
  df,
  Direction_Abrev = unlist(
    ifelse(
      lengths(
        v <- sapply(
          strsplit(Direction_Abrev, "-"),
          function(x) do.call(intersect, strsplit(x, ""))
        )
      ),
      v,
      Direction_Abrev
    )
  )
)

这使

  Day Month Year  Time Direction_Abrev
1  15   Sep 2013 8-9am               S
2  16   Sep 2013 8-9am               S
3  17   Sep 2013 8-9am               S
4  18   Sep 2013 8-9am            W-SE
5  19   Sep 2013 8-9am               W
6  20   Sep 2013 8-9am               S

如何计算字符串中的字母并返回 R 中数据框中行的最高出现字母

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-01-19 16:44:05

解决方案2
1 2021-01-19 16:48:37

如何计算字符串中的字母并返回 R 中数据框中行的最高出现字母

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-01-19 16:44:05

解决方案2 1 2021-01-19 16:48:37

解决方案1
1 已采纳 2021-01-19 16:44:05

解决方案2
1 2021-01-19 16:48:37