[英]One factor level becomes NA when I reorder them, why is this?
我有以下數據:
> dataAvg
# A tibble: 20 x 3
# Groups: Date [5]
Date Rate meanNitrogen
<fct> <fct> <dbl>
1 7.16 Rate 1 1.36
2 7.16 Rate 2 1.29
3 7.16 Rate 3 1.40
4 7.16 Rate 4 1.11
5 7.22 Rate 1 1.41
6 7.22 Rate 2 1.34
7 7.22 Rate 3 1.62
8 7.22 Rate 4 1.08
9 7.29 Rate 1 1.38
10 7.29 Rate 2 1.39
11 7.29 Rate 3 1.51
12 7.29 Rate 4 1.14
13 7.8 Rate 1 1.34
14 7.8 Rate 2 1.38
15 7.8 Rate 3 1.38
16 7.8 Rate 4 1.08
17 8.05 Rate 1 1.39
18 8.05 Rate 2 1.35
19 8.05 Rate 3 1.42
20 8.05 Rate 4 1.02
我正在嘗試制作以下ggplot:
ggplot(dataAvg, aes(x=Date, y=meanNitrogen, group=Rate)) +
geom_bar(stat="identity") +
facet_wrap(.~Rate)
但是,日期(一個因素)是按字母順序而不是按時間順序讀取的。 為了改變這一點,我添加了以下代碼行:
dataAvg$Date <- factor(dataAvg$Date,levels(dataAvg$Date)[c(4,1,2,3,5)])
這是更改順序之前的output:
structure(list(Date = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L,
6L), .Label = c("7.1", "7.16", "7.22", "7.29", "7.8", "8.05",
"8.18"), class = "factor"), Rate = structure(c(1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L), .Label = c("Rate 1", "Rate 2", "Rate 3", "Rate 4"
), class = "factor"), meanNitrogen = c(4.955, 5.005, 5.1075,
4.01, 6.3325, 5.485, 6.1825, 4.2275, 5.195, 4.825, 5.325, 3.765,
5.0225, 4.93, 5.3925, 3.82, 5.2225, 5.34, 5.2025, 4.0225, 4.43,
4.3775, 4.725, 3.7025)), row.names = c(NA, -24L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), groups = structure(list(Date = structure(1:6, .Label = c("7.1",
"7.16", "7.22", "7.29", "7.8", "8.05", "8.18"), class = "factor"),
.rows = list(1:4, 5:8, 9:12, 13:16, 17:20, 21:24)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))
這是 output 之后:
> dput(dataAvg)
structure(list(Date = structure(c(1L, 1L, 1L, 1L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 2L, 2L, 2L, 2L, 6L, 6L, 6L,
6L), .Label = c("7.1", "7.8", "7.16", "7.22", "7.29", "8.05"), class = "factor"),
Rate = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L,
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("Rate 1",
"Rate 2", "Rate 3", "Rate 4"), class = "factor"), meanNitrogen = c(4.955,
5.005, 5.1075, 4.01, 6.3325, 5.485, 6.1825, 4.2275, 5.195,
4.825, 5.325, 3.765, 5.0225, 4.93, 5.3925, 3.82, 5.2225,
5.34, 5.2025, 4.0225, 4.43, 4.3775, 4.725, 3.7025)), row.names = c(NA,
-24L), groups = structure(list(Date = structure(1:6, .Label = c("7.1",
"7.16", "7.22", "7.29", "7.8", "8.05", "8.18"), class = "factor"),
.rows = list(1:4, 5:8, 9:12, 13:16, 17:20, 21:24)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
在其他情況下,這已經解決了這個問題,但是,在這里我丟失了 ggplot 中的“8.05”日期。 日期被替換為“NA”值。 在 stackoverflow 或其他地方搜索時,我找不到解決方案。 任何擺脫 NA 的幫助將不勝感激。 謝謝!
我將提出一些建議,這些建議並不能像書面回答你的問題,但我認為可能會改善數據可視化。 看看你是否同意。
date
作為變量類型假設您的日期是 2020 年並且當前格式是month.day
,您可以使用dplyr::mutate
轉換它們:
library(dplyr)
library(ggplot2)
dataAvg %>%
mutate(newDate = as.Date(paste0(Date, ".2020"), "%m.%d.%Y")) %>%
ggplot(aes(newDate, meanNitrogen)) +
geom_line() +
facet_wrap(~Rate)
結果:
編輯:由於您的重點是按給定日期的比率進行比較,因此更好的折線圖將按比率着色,而不是使用構面。
dataAvg %>%
mutate(newDate = as.Date(paste0(Date, ".2020"), "%m.%d.%Y")) %>%
ggplot(aes(newDate, meanNitrogen)) +
geom_line(aes(color = Rate))
或者,如果您認為列更清晰:
dataAvg %>%
mutate(newDate = as.Date(paste0(Date, ".2020"), "%m.%d.%Y")) %>%
ggplot(aes(newDate, meanNitrogen)) +
geom_col(aes(fill = Rate), position = position_dodge())
當您levels(avgData$Date)
進行子集化時,您可以通過手動排列 1 到 5 之間的值來進行子集化。 avgData$Date
有七個級別,因此您排除的級別(即第六和第七級別)將從向量中刪除。
您需要執行以下操作:
dataAvg$Date <- factor(dataAvg$Date,levels(dataAvg$Date)[c(4,1,2,3,5,6,7)])
只有按照你想要的順序。
正如其他人所說,您最好的選擇是使用日期 object,而不是一個因素。 如果您想/需要使用因子,請考慮使用來自forcats
package 的函數,這些函數不太可能丟棄因子,例如forcats::relevel()
(類似於您當前正在做的事情)或forcats::reorder()
(通過 function 運行關卡並根據輸出重新排序)。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.