簡體   English   中英

如何將`base R`位置plot遷移到`ggplot2`並避免`for`循環?

[英]How to migrate `base R` location plot to `ggplot2` and avoid `for` loop?

我對可視化句子中某些單詞的位置感興趣。 假設我有 500 個句子,長度在 3-5 個單詞之間,並且想要可視化單詞A在每個句子中的位置:

數據:

set.seed(123)
w1 <- sample(LETTERS[1:3], 1000, replace = TRUE) 
w2 <- sample(LETTERS[1:5], 1000, replace = TRUE)
w3 <- sample(LETTERS[1:6], 1000, replace = TRUE)
w4 <- sample(c(NA,LETTERS[1:7]), 1000, replace = TRUE)
w5 <- sample(c(NA,LETTERS[1:8]), 1000, replace = TRUE)

df <- data.frame(
  position = rep(1:5, each = 1000),                       # position of word in sentence
  word = c(w1, w2, w3, w4, w5)                            # the words in the sentences
)

我可以在base R 但是代碼涉及一個非常慢for循環,並且不具備ggplot2的美學品質。 那么如何ggplot2中更快地生成相同類型的可視化?

在此處輸入圖像描述

這是在base R中生成位置 plot 的代碼:

# Plot dimensions:
x <- rep(1:5, 100)
y <- 1:500

# Plot parameters:
par(mar=c(2,1.5,1,1.5), par(xpd = T))

# Plot:
plot(y ~ x, type = "n", frame = F, axes = F, ylab="", xlab="", 
     main="Location of word 'A' in sentences", cex.main=0.9)
axis(1, at=seq(1:5), labels=c("w1", "w2", "w3", "w4", "w5"), cex.axis=0.9)

# Legend:
legend(2.25, 530, c("A", "other", "NA"), fill=c("blue", "orange", "black"), 
       horiz = T, cex = 0.7, bty = "n", border = "white")

# For loop to print 'A' as color in positions:
for(i in unique(df$position)){
  text(i, 1:500, "__________", cex = 1,
       col = ifelse(df[df$position==i,]$word=="A", "blue", "orange"))
}

例如使用 geom_segment,然后使用有條件的審美。

我使用 ggh4x 作為截斷軸。

library(tidyverse)
library(ggh4x)

df <- 
  df %>% group_by(position) %>%mutate(index = row_number())

ggplot(df, aes( color = word=="A")) +
  geom_segment(aes(x = position-.4, xend = position+.4, 
                   y = index, yend = index),
               key_glyph= "rect") +
  scale_color_manual(name = NULL, 
                     values=c(`TRUE` = "blue", `FALSE` = "orange"),
                     labels = c(`TRUE` = "A", `FALSE` = "other"),
                     na.value="black")+ 
  guides(x = "axis_truncated") +
  scale_x_continuous(breaks = 1:5, labels = paste0("w", 1:5))+
  theme_classic() +
  theme(axis.line.y = element_blank(),
        axis.ticks.y = element_blank(), 
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        plot.title = element_text(hjust = .5),
        legend.position = "top") +
  labs( y = NULL, x = NULL, title = "Location of A")

這是一個初步的嘗試。 (我不太清楚,您是否只想顯示 1000 個句子中的前 500 個?)

我在這里的方法是首先根據 A / other / NA 的連續部分總結數據。 這樣,plot 區域被精確填充,無需調整線條粗細,並且應該通過減少繪制元素的數量來更快地 plot。

library(dplyr)
df_plot <- df %>%
  mutate(A_spots = case_when(word == "A" ~ "A",
                             word != "A" ~ "other",
                             TRUE ~ "NA")) %>%
  group_by(position) %>%
  mutate(col_chg = A_spots != lag(A_spots, default = ""),
         group_num = cumsum(col_chg)) %>%
  ungroup() %>%
  count(position, group_num, A_spots)
  
library(ggplot2)
ggplot(df_plot, aes(position, n, fill = A_spots, group = group_num)) +
  geom_col() +
  scale_x_continuous(name = NULL, breaks = 1:5,   #stolen from @tjebo's answer
                 labels = paste0("w", 1:5))+
  scale_fill_manual(
    values = c("A" = "blue","other" = "orange", "NA" = "black")) +
  labs(title = "Location of word 'A' in sentences") +
  theme_minimal()

在此處輸入圖像描述

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM