简体   繁体   中英

Extract top positive and negative values from dataframe and fill them into a formatted text using R

I'm trying to extract material infos whose price increase and decrease the most top 3 base on pct_change column.

Data:

df <- structure(list(material = c("Copper", "Aluminum", "Iron", "Zinc", 
"Nickel", "Silver", "Gold", "Tin"), price = c(17125, 8312, 2228.5, 
2934, 4315, 8178, 4411, 680), pct_change = c(0.025449102, 0.007166746, 
-0.024939838, 0.062470043, -0.043873255, -0.004625122, 0.045031392, 
-0.037508846)), class = "data.frame", row.names = c(NA, -8L))

My expected result will be a paragraph of text as follows:

text <- 'The top 3 commodities that price rise most are: Zinc (6.25%), Gold (4.5%), and Copper (2.54%), 
the top 3 commodities that fall most are: Nickel (-4.39%), Tin (-3.75%) and Iron (-2.49%).'

My trial code works, but not concise, does someone could share other more efficient solutions? Thanks.

top3 <- df %>% 
  arrange(desc(pct_change)) %>% 
  mutate(pct_change=scales::percent(pct_change)) %>% 
  slice_head(n=3)

tail3 <- df %>% 
  arrange(pct_change) %>% 
  mutate(pct_change=scales::percent(pct_change)) %>% 
  slice_head(n=3)

com_name_up1 <- top3$material[1]
com_pct_up1 <- top3$pct_change[1]

com_name_up2 <- top3$material[2]
com_pct_up2 <- top3$pct_change[2]

com_name_up3 <- top3$material[3]
com_pct_up3 <- top3$pct_change[3]

com_name_down1 <- tail3$material[1]
com_pct_down1 <- tail3$pct_change[1]

com_name_down2 <- tail3$material[2]
com_pct_down2 <- tail3$pct_change[2]

com_name_down3 <- tail3$material[3]
com_pct_down3 <- tail3$pct_change[3]

text <- glue('The top 3 commodities that price rose most are: {com_name_up1} ({com_pct_up1}), 
{com_name_up2} ({com_pct_up2}), and {com_name_up3} ({com_pct_up3}), 
the top 3 commodities that fell most are: {com_name_down1} ({com_pct_down1}), 
             {com_name_down2} ({com_pct_down2}) and {com_name_down3} ({com_pct_down3}).')

To be honest I'm not sure if it's actually much shorter but you could look to glue the material and pct_change within the table first.

I've then grouped it up and collapsed the strings

df %>% 
  arrange(desc(pct_change)) %>% 
  mutate(
    t1 = sprintf('%s (%.2f%%)', material, pct_change*100),
    rank1 = case_when(
      row_number() <= 3 ~ 'Top',
      row_number() > n() -3 ~ 'Bot'
    )
  ) %>%
  group_by(rank1) %>%
  summarise(
    t2 = paste(t1, collapse = ', ')
  )

  rank1 t2                                          
  <chr> <chr>                                       
1 Bot   Iron (-2.49%), Tin (-3.75%), Nickel (-4.39%)
2 Top   Zinc (6.25%), Gold (4.50%), Copper (2.54%)  
3 NA    Aluminum (0.72%), Silver (-0.46%)  

Two general suggestions:

  1. glue has a very flexible syntax that allows you to pass any valid R expression into the "{...}" . Utilizing this feature will help shorten your code.
  2. You should abstract out the parts with similar structures and summarize them into a function.

Here is the code

report3 <- function(df, f) {
  df |> 
    f(pct_change, n = 3L) |> 
    dplyr::mutate(pct_change = scales::percent(pct_change)) |> 
    glue::glue_data("{material} ({pct_change})")
}
top3 <- report3(df, dplyr::slice_max)
bot3 <- report3(df, dplyr::slice_min)
text <- glue::glue('The top 3 commodities that price rose most are: \\
                   {top3[[1L]]}, {top3[[2L]]}, and {top3[[3L]]}; \\
                   the top 3 commodities that fell most are: \\
                   {bot3[[1L]]}, {bot3[[2L]]} and {bot3[[3L]]}.')

Output

> text
The top 3 commodities that price rose most are: Zinc (6.2%), Gold (4.5%), and Copper (2.5%); the top 3 commodities that fell most are: Nickel (-4.39%), Tin (-3.75%) and Iron (-2.49%).

Another possible solution (instead of stringr::str_c , you could use the more convenient stringr::str_glue , as @ekoam well suggests):

library(tidyverse)

df <- structure(list(material = c("Copper", "Aluminum", "Iron", "Zinc", 
"Nickel", "Silver", "Gold", "Tin"), price = c(17125, 8312, 2228.5, 
2934, 4315, 8178, 4411, 680), pct_change = c(0.025449102, 0.007166746, 
-0.024939838, 0.062470043, -0.043873255, -0.004625122, 0.045031392, 
-0.037508846)), class = "data.frame", row.names = c(NA, -8L))

top3 <- slice_max(df, pct_change, n = 3)
bottom3 <- slice_min(df, pct_change, n = 3)

str_c("The top 3 commodities that price rise most are: ", 
top3$material[1]," (", round(100*top3$pct_change[1], 2),"%), ",
top3$material[2]," (", round(100*top3$pct_change[2],2),"%), and ", 
top3$material[3]," (", round(100*top3$pct_change[3]),"%), the top 3 
commodities that fall most are: ", bottom3$material[1]," (", 
round(100*bottom3$pct_change[1], 2),"%), ", bottom3$material[2]," (", 
round(100*bottom3$pct_change[2],2),"%), and ", bottom3$material[3]," 
(", round(100*bottom3$pct_change[3]),"%).")

#> [1] "The top 3 commodities that price rise most are: Zinc (6.25%), Gold (4.5%), and Copper (3%), the top 3 commodities that fall most are: Nickel (-4.39%), Tin (-3.75%), and Iron (-2%)."

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM