如何计算字符串中的字母并返回 R 中数据框中行的最高出现字母

Question

I have a column in a data frame that consists of letters describing wind directions.我在数据框中有一列，其中包含描述风向的字母。 I need to find the most common direction for each row, which would involve counting the number of occurrences of each letter, and then selecting the letter that was most common.我需要为每一行找到最常见的方向，这将涉及计算每个字母的出现次数，然后选择最常见的字母。 This is an example of the data frame:这是数据框的示例：

structure(list(Day = c("15", "16", "17", "18", "19", "20"), Month = structure(c(4L, 
4L, 4L, 4L, 4L, 4L), .Label = c("Dec", "Nov", "Oct", "Sep"), class = "factor"), 
    Year = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2012", 
    "2013", "2014", "2015", "2018", "2019", "2020"), class = "factor"), 
    Time = structure(c(10L, 10L, 10L, 10L, 10L, 10L), .Label = c("1-2pm", 
    "10-11am", "11-12am", "12-1pm", "2-3pm", "3-4pm", "4-5pm", 
    "5-6pm", "7-8am", "8-9am", "9-10am"), class = "factor"), 
    Direction_Abrev = c("S-SE", "S-SE", "SW-S", "W-SE", "W-SW", 
    "SW-S")), row.names = c(NA, 6L), class = "data.frame")

I would like the resulting data frame to be like the following:我希望生成的数据框如下所示：

  Day Month Year  Time Direction_Abrev
1  15   Sep 2013 8-9am              S
2  16   Sep 2013 8-9am              S
3  17   Sep 2013 8-9am              S
4  18   Sep 2013 8-9am           W-SE
5  19   Sep 2013 8-9am              W
6  20   Sep 2013 8-9am              S

that returns the most common letter.返回最常见的字母。 There is an issue (like row 4), where all letters are equally common.有一个问题（如第 4 行），所有字母都同样常见。 In these cases I would like to return the original value if that is possible.在这些情况下，如果可能的话，我想返回原始值。 Thanks in advance.提前致谢。

Answer 1

sapply(dat$Direction_Abrev, function(s) {
  counts <- sort(table(setdiff(strsplit(s, ""), "-")), decreasing = TRUE)
  if (length(counts) < 2 || counts[1] == counts[2]) s else names(counts)[1]
})
#   S-SE   S-SE   SW-S   W-SE   W-SW   SW-S 
#    "S"    "S"    "S" "W-SE"    "W"    "S"

Answer 2

Here is a base R option using strsplit + intersect这是使用strsplit + intersect的基本 R 选项

transform(
  df,
  Direction_Abrev = unlist(
    ifelse(
      lengths(
        v <- sapply(
          strsplit(Direction_Abrev, "-"),
          function(x) do.call(intersect, strsplit(x, ""))
        )
      ),
      v,
      Direction_Abrev
    )
  )
)

which gives这使

  Day Month Year  Time Direction_Abrev
1  15   Sep 2013 8-9am               S
2  16   Sep 2013 8-9am               S
3  17   Sep 2013 8-9am               S
4  18   Sep 2013 8-9am            W-SE
5  19   Sep 2013 8-9am               W
6  20   Sep 2013 8-9am               S

如何计算字符串中的字母并返回 R 中数据框中行的最高出现字母

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-01-19 16:44:05

解决方案2
1 2021-01-19 16:48:37

如何计算字符串中的字母并返回 R 中数据框中行的最高出现字母

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-01-19 16:44:05

解决方案2 1 2021-01-19 16:48:37

解决方案1
1 已采纳 2021-01-19 16:44:05

解决方案2
1 2021-01-19 16:48:37