简体   繁体   中英

ggplot - plot an average of categories on the x-axis in R

Good evening,

this is my first question, so please be kind.

I want to analyse a dataset with more than 150 cols and 300 rows with R Studio but I'm a newbie. My problem is here that I want to plot a line or bar chart with ggplot. Unfortunately I can't plot on the x-axis the category i with an average (with gender) of this category (regardless of whether plot or ggplot is used). Another Question is to replace "." in the title (colname) in the chart(s).

The main code for this question is attached and also a picture of a chart using Excel (as example). In the best case my code will create for each heading catergory (the first two numbers of the colname) a chart with the sub categories (second 2 numbers). But at first I tried to plot a chart with one category but it didn't worked.

I would be pleased about a feedback or tip because it can't be that hard but I didn't found something yet.

Many thanks in advance.

PS: The comment of Sandy from this question didn't worked for me.

Roh_daten <- data.frame(Age=c(25,22,23,21,21,18),Geschlecht=c("m","w","m","m","m","m"),Test.Kette_01_01 = c(6,5,5,4,5,5),Test.String_01_02=c(2,5,5,3,3,4),Testchar_02_01 = c(0,5,5,4,6,6))
Laufzahl_i <- 1
Farbe_m="blue"#willkürlich festgelegt
Farbe_w="red"#willkürlich festgelegt

library(ggplot2)
library(stringr)

Links = function(text, num_char) {
  substr(text, 1, num_char)
}
Rechts = function(text, num_char) {
  substr(text, nchar(text) - (num_char-1), nchar(text))
}

for(i in 2:ncol(Roh_daten)) #nicht 1 da dies nur die ID ist
{
  #print(colnames(Roh_daten[i]))
  if(i==ncol(Roh_daten)) break()

  #colnames(Roh_daten[i]) <- c(String_in_string_replace(colnames(Roh_daten[i]),"\\.","\\ ","All"))

  if(all.equal(Roh_daten[,i], as.integer(Roh_daten[,i]))==TRUE)
  {
    assign(paste(colnames(Roh_daten[i]),"test_men",sep = "_"),mean(Roh_daten[,i][Roh_daten$Geschlecht == "m"],na.rm = TRUE))#erstellt aus dem paste String eine Variable
    assign(paste(colnames(Roh_daten[i]),"test_woman",sep = "_"),mean(Roh_daten[,i][Roh_daten$Geschlecht == "w"],na.rm = TRUE))
    assign(paste(colnames(Roh_daten[i]),"test_m_w",sep = "_"),mean(subset(Roh_daten[,i],Roh_daten$Geschlecht == "m" | Roh_daten$Geschlecht == "w"),na.rm = TRUE))

    if(Links(Rechts(colnames(Roh_daten[i]),5),2) == Links(Rechts(colnames(Roh_daten[i-1]),5),2)){#nur wenn stimmt alle -1
      #print(Links(Rechts(colnames(Roh_daten[i-1]),5),2))
      Laufzahl_i=Laufzahl_i+1
      if(Links(Rechts(colnames(Roh_daten[i]),5),2) == Links(Rechts(colnames(Roh_daten[i+1]),5),2)){#letztes element von alle mit der bed. von oben
      }else{
        #print(c("Es wurde ", Laufzahl_i, " Mal der gleiche Bereich erkannt."))
        Laufzahl_i <- 1

        Var_name_m <-  paste(colnames(Roh_daten[i]),"test_men",sep = "_")
        Var_name_w <-  paste(colnames(Roh_daten[i]),"test_woman",sep = "_")

        plot(get(Var_name_m),t="b",col=Farbe_m,ylim = c(0,6),yaxt="n",main = Links(Var_name_m,str_locate(Var_name_m,"_")-1),ylab="Wichtigkeit")
        text(x=get(Var_name_m),labels = as.character(round(get(Var_name_m),digits = 2)),pos=2,col = Farbe_m)
        text(x=get(Var_name_w),labels = as.character(round(get(Var_name_w),digits = 2)),pos=4,col = Farbe_w)
        axis(2, at = seq(0, 6, by = 0.5), las=2)
        legend(x ="topleft", legend = c("m","w"),col=c(Farbe_m, Farbe_w), bty = "o")
        points(get(Var_name_w),t="b",col=Farbe_w,ylim = c(0,6))

        p <- ggplot(data=Roh_daten[i],aes(x=get(Var_name_m),y=get(Var_name_m))) + #xlab(colnames(Roh_daten[,i]))
          #geom_line(linetype=2) +
          geom_point(size=1,col=Farbe_m) +
          geom_point(size=1,col=Farbe_w,aes(y=get(Var_name_w))) +
          theme(panel.border = element_rect(colour = "black", fill=NA, size=0.5))
          #geom_bar(stat="identity")
          #scale_y_continuous(breaks = seq(1,6,by=1)) 
        p
#ggplot(data=Roh_daten[i],aes(x=get(Var_name_m),y=get(Var_name_m))) + stat_summary(fun.y=mean, geom = "point")
      }
    }

  }else {
    print(paste(colnames(Roh_daten[i])," hat einen Fehler (String)"))
  }
}
p

Excel 图表 - 示例

Question1: plotting the average per gender of each categories

I'm not sure that it is exactly what you are asking for but from my understanding, you are looking to get the same plot you get with excel. Breifly, the average of each gender for each category plotted as a line or a barchart and with mean values display on it.

Based on the example you provided, you can have the use of dplyr and tidyr libraries to average each column based on their gender and get them reshape for plotting in ggplot . Here how you can do it by steps:

First, get the average of each columns based on gender:

library(dplyr)
Roh_daten %>% 
  group_by(Geschlecht) %>% 
  summarise_all(.funs = mean) 

# A tibble: 2 x 5
  Geschlecht   Age Test.Kette_01_01 Test.String_01_02 Testchar_02_01
  <fct>      <dbl>            <dbl>             <dbl>          <dbl>
1 m           21.6                5               3.4            4.2
2 w           22                  5               5              5  

Next, we want to reshape these data in order to match the grammar of ggplot2 (briefly summarise, an unique column for x values, an unique column for y values, and columns for each categories) to be used, so you can use the function pivot_longer from tidyr :

library(dplyr)
library(tidyr)
Roh_daten %>% 
  group_by(Geschlecht) %>% 
  summarise_all(.funs = mean) %>% 
  pivot_longer(., -c(Geschlecht, Age), names_to = "Variable", values_to = "Value")

# A tibble: 6 x 4
  Geschlecht   Age Variable          Value
  <fct>      <dbl> <chr>             <dbl>
1 m           21.6 Test.Kette_01_01    5  
2 m           21.6 Test.String_01_02   3.4
3 m           21.6 Testchar_02_01      4.2
4 w           22   Test.Kette_01_01    5  
5 w           22   Test.String_01_02   5  
6 w           22   Testchar_02_01      5  

Finally, we can use ggplot2 to get a bar chart like this:

library(dplyr)
library(tidyr)
library(ggplot2)
Roh_daten %>% 
  group_by(Geschlecht) %>% 
  summarise_all(.funs = mean) %>% 
  pivot_longer(., -c(Geschlecht, Age), names_to = "Variable", values_to = "Value") %>%
  ggplot(., aes(x = Variable, y = Value, group = Geschlecht))+
  geom_bar(stat = "identity", aes(fill = Geschlecht), position = position_dodge())+
  theme(legend.position = "top")+
  geom_label(aes(label = Value), position = position_dodge(0.9), vjust = -0.5)+
  ylim(0,5.5)

在此处输入图片说明

Or get lines and points like this (the library ggrepel will help to display labeling without overlapping on each other:

library(dplyr)
library(tidyr)
library(ggplot2)
library(ggrepel)
Roh_daten %>% 
  group_by(Geschlecht) %>% 
  summarise_all(.funs = mean) %>% 
  pivot_longer(., -c(Geschlecht, Age), names_to = "Variable", values_to = "Value") %>%
  ggplot(., aes(x = Variable, y = Value, color = Geschlecht, group = Geschlecht))+
  geom_point()+
  geom_line()+
  theme(legend.position = "top")+
  geom_label_repel(aes(label = Value), vjust = -0.5)

在此处输入图片说明

Is it the kind of plot you are looking ? If not, can you clarify your question because I did not understand all your code.

Question2: Replacement of dots in colnames

For your second question regarding the replacement of "." in colnames of your dataset, you can have the use of the library rebus :

library(rebus)
gsub(DOT,"-", colnames(Roh_daten))

[1] "Age"               "Geschlecht"        "Test-Kette_01_01"  "Test-String_01_02" "Testchar_02_01"   

I hope it answer your questions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM