简体   繁体   English

使用 R 中数据帧中的条件对日期数据进行分组

[英]grouping of date data with a condition in a dataframe in R

I have a csv file with several variables, as illustrated below (just for example):我有一个包含多个变量的 csv 文件,如下图所示(仅作为示例):

Region    crop      product    date_periode

A         aaaa      bilon      2016052q

A         aaaa      mailon     2016021q 

B         cccc      drox       2016042q

A         cccc      marob      2015081q

C         dddd      salon      2016062q

C         dddd      dilon      2016071q

D         aaaa      daxon      2015032q

D         aaaa      bayon      2016042q

the dates are periods : 20170502q : the seconde half of mai 2017 and i want to do a grouping individuals per crop and region such as every time the number of individuals for a date for a crop in a region is less than 5% of the total number of individuals with the same date for a crop in a region this date is grouped with the adjacent date (the date, in this case, can become 2016062q-2016071q if we group the two periods for example), and every time for each culture in each region.日期是时期:20170502q:2017 年下半年,我想对每个作物和地区的个体进行分组,例如每次某个地区作物日期的个体数量少于总数的 5%一个地区同一作物日期的个体数量,该日期与相邻日期分组(在这种情况下,如果我们将两个时期分组,则日期可以变为 2016062q-2016071q),并且每次对于每种文化在每个地区。 if we have this table for exemple :例如,如果我们有这张表:

region    crop       date         Numbre of ID    % of ID

A         aaaa    20170201q         1             1

A         aaaa    20170202q        44            48

A         aaaa    20170301q        30            33

A         aaaa    20170302q        14            15

A         aaaa    20170401q         1             1

A         aaaa    20170402q         1             1

A         aaaa    20170601q         1             1

i want to arrive to have this one after analysing我想在分析后到达拥有这个

region      crop      date                    Number of ID      % of ID

A           aaaa      20170201q-20170202q         45              49

A           aaaa      20170301q                   30              33

A           aaaa      20170302q-20170601q         17              18 

I don't know if I'm clear enough but I'm here if you have any questions above, thank you in advance我不知道我是否足够清楚,但如果您有上述任何问题,我会在这里,提前谢谢您

Using tidyverse we can do this using:使用tidyverse我们可以这样做:

df %>% 
group_by(Region, crop, date_periode) %>% 
summarise(number = n_distinct(product)) %>% 
ungroup() %>% 
left_join(  df %>% 
            group_by(Region, crop) %>% 
            summarise(number_t = n_distinct(product)) %>% 
            ungroup(), by = c("Region", "crop")) %>% 
mutate(Percent = number/number_t)

I think this is what you are getting at?我想这就是你的意思? I am assuming Number is the total distinct product .我假设Number是总的不同product

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM