简体   繁体   中英

Extract values from dataframe and fill them into a template phrase in R

Given a dataset as follows:

df <- structure(list(type = structure(c(2L, 3L, 1L), .Label = c("negative", 
"positive", "zero"), class = "factor"), count = c(10L, 5L, 8L
), percent = c(43.5, 21.7, 34.8)), class = "data.frame", row.names = c(NA, 
-3L))

Out:

在此处输入图像描述

I woudl like to fill the value from table to the template phrase as follows:

In 2020, we have 10 cities have positive growth, which covers 43.5 % of all cities; 5 cities have zero growth, which covers 21.7 % of all cities; and 8 cities have negative growth, which covers 21.7 % of all cities.

Template:

In 2020, we have {} cities have {} growth, which covers {} % of all cities; {} cities have {} growth, which covers {} % of all cities; and {} cities have {} growth, which covers {} % of all cities.

How could I do that in R?

You can create a simple sentence with paste0 / sprintf and change the placeholders with respective values from the dataframe.

This is another way which does not require listing each individual value from the dataframe.

string <- 'In 2020, we have %s cities have %s growth, which covers %s %% of all cities; %s cities have %s growth, which covers %s %% of all cities; and %s cities have %s growth, which covers %s %% of all cities'
do.call(sprintf, c(as.list(c(t(df[c(2, 1, 3)]))), fmt = string))

#[1] "In 2020, we have 10 cities have positive growth, which covers 43.5 % of all #cities;  5 cities have zero growth, which covers 21.7 % of all cities; and  8 #cities have negative growth, which covers 34.8 % of all cities"

df[c(2, 1, 3)] is used to reorder the columns so that count is the 1st column and type 2nd. This is needed since your sentence always has count value first, then type and last percent . c(t(df[c(2, 1, 3)])) changes the dataframe to vector in a row-wise fashion which is passed to sprintf as different arguments.

I'd recommend using the glue package over base string literal functions because a) it's more readable and b) the middle part of your template is the same phrase repeated for each row of your data frame, so we can use glue_data() to reduce repetition:

library(glue)

# Example data
df <- structure(list(type = structure(c(2L, 3L, 1L), .Label = c("negative", "positive", "zero"),
        class = "factor"), count = c(10L, 5L, 8L), percent = c(43.5, 21.7, 34.8)),
    class = "data.frame", row.names = c(NA, -3L))

growth <- glue_data(df, "{count} cities have {type} growth, which covers {percent}% of all cities")

# Add "and ..." to the last phrase:
growth[length(growth)] <- glue("and ", growth[length(growth)])

glue("In 2020, we have ", glue_collapse(growth, sep = "; "), ".")
#> In 2020, we have 10 cities have positive growth, which covers 43.5% of all cities; 5 cities have zero growth, which covers 21.7% of all cities; and 8 cities have negative growth, which covers 34.8% of all cities.

Created on 2021-02-24 by the reprex package (v1.0.0)

This also has the advantage of scaling to a data frame with any number of rows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM