![](/img/trans.png)
[英]how to return 2 highest score and 2 lowest score based on a data frame in r?
[英]How to count letters in a string and return the highest occurring letter for rows in a data frame in R
我在數據框中有一列,其中包含描述風向的字母。 我需要為每一行找到最常見的方向,這將涉及計算每個字母的出現次數,然后選擇最常見的字母。 這是數據框的示例:
structure(list(Day = c("15", "16", "17", "18", "19", "20"), Month = structure(c(4L,
4L, 4L, 4L, 4L, 4L), .Label = c("Dec", "Nov", "Oct", "Sep"), class = "factor"),
Year = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2012",
"2013", "2014", "2015", "2018", "2019", "2020"), class = "factor"),
Time = structure(c(10L, 10L, 10L, 10L, 10L, 10L), .Label = c("1-2pm",
"10-11am", "11-12am", "12-1pm", "2-3pm", "3-4pm", "4-5pm",
"5-6pm", "7-8am", "8-9am", "9-10am"), class = "factor"),
Direction_Abrev = c("S-SE", "S-SE", "SW-S", "W-SE", "W-SW",
"SW-S")), row.names = c(NA, 6L), class = "data.frame")
我希望生成的數據框如下所示:
Day Month Year Time Direction_Abrev
1 15 Sep 2013 8-9am S
2 16 Sep 2013 8-9am S
3 17 Sep 2013 8-9am S
4 18 Sep 2013 8-9am W-SE
5 19 Sep 2013 8-9am W
6 20 Sep 2013 8-9am S
返回最常見的字母。 有一個問題(如第 4 行),所有字母都同樣常見。 在這些情況下,如果可能的話,我想返回原始值。 提前致謝。
sapply(dat$Direction_Abrev, function(s) {
counts <- sort(table(setdiff(strsplit(s, ""), "-")), decreasing = TRUE)
if (length(counts) < 2 || counts[1] == counts[2]) s else names(counts)[1]
})
# S-SE S-SE SW-S W-SE W-SW SW-S
# "S" "S" "S" "W-SE" "W" "S"
這是使用strsplit
+ intersect
的基本 R 選項
transform(
df,
Direction_Abrev = unlist(
ifelse(
lengths(
v <- sapply(
strsplit(Direction_Abrev, "-"),
function(x) do.call(intersect, strsplit(x, ""))
)
),
v,
Direction_Abrev
)
)
)
這使
Day Month Year Time Direction_Abrev
1 15 Sep 2013 8-9am S
2 16 Sep 2013 8-9am S
3 17 Sep 2013 8-9am S
4 18 Sep 2013 8-9am W-SE
5 19 Sep 2013 8-9am W
6 20 Sep 2013 8-9am S
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.