[英]Extract values from dataframe and fill them into a template phrase in R
Given a dataset as follows:给定如下数据集:
df <- structure(list(type = structure(c(2L, 3L, 1L), .Label = c("negative",
"positive", "zero"), class = "factor"), count = c(10L, 5L, 8L
), percent = c(43.5, 21.7, 34.8)), class = "data.frame", row.names = c(NA,
-3L))
Out:出去:
I woudl like to fill the value from table to the template phrase as follows:我想将表中的值填充到模板短语中,如下所示:
In 2020, we have 10
cities have positive
growth, which covers 43.5
% of all cities; 2020年我们有
10
个城市实现positive
增长,占所有城市的43.5
%; 5
cities have zero
growth, which covers 21.7
% of all cities; 5
个城市zero
增长,占所有城市的21.7
%; and 8
cities have negative
growth, which covers 21.7
% of all cities. negative
增长的城市有8
个,占全部城市的21.7
%。
Template:模板:
In 2020, we have {} cities have {} growth, which covers {} % of all cities; 2020 年,我们有{}个城市有{}个增长,覆盖所有城市的{} %; {} cities have {} growth, which covers {} % of all cities;
{}个城市有{}个增长,覆盖所有城市的{} %; and {} cities have {} growth, which covers {} % of all cities.
{}个城市有{}个增长,覆盖所有城市的{} %。
How could I do that in R?我怎么能在 R 中做到这一点?
You can create a simple sentence with paste0
/ sprintf
and change the placeholders with respective values from the dataframe.您可以使用
paste0
/ sprintf
创建一个简单的句子,并使用 dataframe 中的相应值更改占位符。
This is another way which does not require listing each individual value from the dataframe.这是另一种不需要列出 dataframe 中的每个单独值的方法。
string <- 'In 2020, we have %s cities have %s growth, which covers %s %% of all cities; %s cities have %s growth, which covers %s %% of all cities; and %s cities have %s growth, which covers %s %% of all cities'
do.call(sprintf, c(as.list(c(t(df[c(2, 1, 3)]))), fmt = string))
#[1] "In 2020, we have 10 cities have positive growth, which covers 43.5 % of all #cities; 5 cities have zero growth, which covers 21.7 % of all cities; and 8 #cities have negative growth, which covers 34.8 % of all cities"
df[c(2, 1, 3)]
is used to reorder the columns so that count
is the 1st column and type
2nd. df[c(2, 1, 3)]
用于对列进行重新排序,以便count
是第一列并type
第二列。 This is needed since your sentence always has count
value first, then type
and last percent
.这是必需的,因为您的句子总是首先具有
count
值,然后是type
和 last percent
。 c(t(df[c(2, 1, 3)]))
changes the dataframe to vector in a row-wise fashion which is passed to sprintf
as different arguments. c(t(df[c(2, 1, 3)]))
以行方式将 dataframe 更改为向量,该向量作为不同的 arguments 传递给sprintf
。
I'd recommend using the glue package over base string literal functions because a) it's more readable and b) the middle part of your template is the same phrase repeated for each row of your data frame, so we can use glue_data()
to reduce repetition:我建议在基本字符串文字函数上使用胶水package,因为 a) 它更具可读性,b) 模板的中间部分是为数据框的每一行重复的相同短语,因此我们可以使用
glue_data()
来减少重复:
library(glue)
# Example data
df <- structure(list(type = structure(c(2L, 3L, 1L), .Label = c("negative", "positive", "zero"),
class = "factor"), count = c(10L, 5L, 8L), percent = c(43.5, 21.7, 34.8)),
class = "data.frame", row.names = c(NA, -3L))
growth <- glue_data(df, "{count} cities have {type} growth, which covers {percent}% of all cities")
# Add "and ..." to the last phrase:
growth[length(growth)] <- glue("and ", growth[length(growth)])
glue("In 2020, we have ", glue_collapse(growth, sep = "; "), ".")
#> In 2020, we have 10 cities have positive growth, which covers 43.5% of all cities; 5 cities have zero growth, which covers 21.7% of all cities; and 8 cities have negative growth, which covers 34.8% of all cities.
Created on 2021-02-24 by the reprex package (v1.0.0)由代表 package (v1.0.0) 于 2021 年 2 月 24 日创建
This also has the advantage of scaling to a data frame with any number of rows.这还具有扩展到具有任意行数的数据框的优点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.