简体   繁体   English

如何计算字符串中的字母并返回 R 中数据框中行的最高出现字母

[英]How to count letters in a string and return the highest occurring letter for rows in a data frame in R

I have a column in a data frame that consists of letters describing wind directions.我在数据框中有一列,其中包含描述风向的字母。 I need to find the most common direction for each row, which would involve counting the number of occurrences of each letter, and then selecting the letter that was most common.我需要为每一行找到最常见的方向,这将涉及计算每个字母的出现次数,然后选择最常见的字母。 This is an example of the data frame:这是数据框的示例:

structure(list(Day = c("15", "16", "17", "18", "19", "20"), Month = structure(c(4L, 
4L, 4L, 4L, 4L, 4L), .Label = c("Dec", "Nov", "Oct", "Sep"), class = "factor"), 
    Year = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2012", 
    "2013", "2014", "2015", "2018", "2019", "2020"), class = "factor"), 
    Time = structure(c(10L, 10L, 10L, 10L, 10L, 10L), .Label = c("1-2pm", 
    "10-11am", "11-12am", "12-1pm", "2-3pm", "3-4pm", "4-5pm", 
    "5-6pm", "7-8am", "8-9am", "9-10am"), class = "factor"), 
    Direction_Abrev = c("S-SE", "S-SE", "SW-S", "W-SE", "W-SW", 
    "SW-S")), row.names = c(NA, 6L), class = "data.frame")

I would like the resulting data frame to be like the following:我希望生成的数据框如下所示:

  Day Month Year  Time Direction_Abrev
1  15   Sep 2013 8-9am              S
2  16   Sep 2013 8-9am              S
3  17   Sep 2013 8-9am              S
4  18   Sep 2013 8-9am           W-SE
5  19   Sep 2013 8-9am              W
6  20   Sep 2013 8-9am              S

that returns the most common letter.返回最常见的字母。 There is an issue (like row 4), where all letters are equally common.有一个问题(如第 4 行),所有字母都同样常见。 In these cases I would like to return the original value if that is possible.在这些情况下,如果可能的话,我想返回原始值。 Thanks in advance.提前致谢。

sapply(dat$Direction_Abrev, function(s) {
  counts <- sort(table(setdiff(strsplit(s, ""), "-")), decreasing = TRUE)
  if (length(counts) < 2 || counts[1] == counts[2]) s else names(counts)[1]
})
#   S-SE   S-SE   SW-S   W-SE   W-SW   SW-S 
#    "S"    "S"    "S" "W-SE"    "W"    "S" 

Here is a base R option using strsplit + intersect这是使用strsplit + intersect的基本 R 选项

transform(
  df,
  Direction_Abrev = unlist(
    ifelse(
      lengths(
        v <- sapply(
          strsplit(Direction_Abrev, "-"),
          function(x) do.call(intersect, strsplit(x, ""))
        )
      ),
      v,
      Direction_Abrev
    )
  )
)

which gives这使

  Day Month Year  Time Direction_Abrev
1  15   Sep 2013 8-9am               S
2  16   Sep 2013 8-9am               S
3  17   Sep 2013 8-9am               S
4  18   Sep 2013 8-9am            W-SE
5  19   Sep 2013 8-9am               W
6  20   Sep 2013 8-9am               S

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM