简体   繁体   English

如何随机抽取select区和R中那个区的村?

[英]How to randomly select districts and villages in that district in R?

I have a data set containing information on district code and name, code and name of the blocks in that district, and the code and name of the villages that come in that block.我有一个数据集,其中包含有关地区代码和名称、该地区街区的代码和名称以及该街区内村庄的代码和名称的信息。

Based on this I want to create a data set that randomly selects a block from the district and randomly takes 10 villages in that selected block.基于此,我想创建一个数据集,从该地区随机选择一个街区,并在该所选街区中随机抽取 10 个村庄。

I have tried using the sample function and the RandomizeR package but could get it to work.我已经尝试使用示例 function 和 RandomizeR package 但可以让它工作。

I sample of the data set (df):我的数据集样本(df):

structure(list(district_code = c(1701L, 1701L, 1701L, 1701L, 
1701L, 1701L, 1701L, 1701L, 1701L, 1701L, 1701L, 1701L), district_name = c("morena", 
"morena", "morena", "morena", "morena", "morena", "morena", "morena", 
"morena", "morena", "morena", "morena"), block_code = c(1701001L, 
1701001L, 1701001L, 1701001L, 1701001L, 1701001L, 1701001L, 1701001L, 
1701001L, 1701001L, 1701001L, 1701001L), block_name = c("ambah", 
"ambah", "ambah", "ambah", "ambah", "ambah", "ambah", "ambah", 
"ambah", "ambah", "ambah", "ambah"), village_code = 1701001001:1701001012, 
    village_name = c("badfara", "bichola", "bhandauli", "lallubasai", 
    "kakarari", "rithona", "goonjh", "malbasai", "aroli", "khirenta", 
    "dandoli", "beelpur")), row.names = c(NA, 12L), class = "data.frame")

Second sample (df1)第二个样本 (df1)

structure(list(district_code = c(3424L, 3424L, 3424L, 3424L, 
3424L, 3424L, 3424L, 3424L, 3401L, 3401L, 3401L, 3401L, 3401L, 
3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L), district_name = c("khunti", 
"khunti", "khunti", "khunti", "khunti", "khunti", "khunti", "khunti", 
"ranchi", "ranchi", "ranchi", "ranchi", "ranchi", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga"), block_code = c(3401020L, 3401020L, 
3401020L, 3401020L, 3401020L, 3401020L, 3401020L, 3401020L, 3401024L, 
3401024L, 3401024L, 3401024L, 3401024L, 3402001L, 3402001L, 3402001L, 
3402001L, 3402001L, 3402001L, 3402001L, 3402001L), block_name = c("torpa", 
"torpa", "torpa", "torpa", "torpa", "torpa", "torpa", "torpa", 
"khelari", "khelari", "khelari", "khelari", "khelari", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga"), panchayat_code = c(3401020009, 3401020010, 
3401020011, 3401020012, 3401020013, 3401020014, 3401020015, 3401020016, 
3401024001, 3401024002, 3401024003, 3401024004, 3401024005, 3402001001, 
3402001002, 3402001003, 3402001004, 3402001005, 3402001006, 3402001007, 
3402001008), panchayat_name = c("marcha", "okra", "sundari", 
"tapkara", "torpa east", "torpa west", "ukrimari", "urikela", 
"churi east", "churi middle", "churi north", "churi south", "churi west", 
"hesal", "hirhi", "manho", "jori", "nigni", "juriya", "harmu", 
"rampur")), row.names = 379:399, class = "data.frame")
> dput(jk_subset[379:409,])
structure(list(district_code = c(3424L, 3424L, 3424L, 3424L, 
3424L, 3424L, 3424L, 3424L, 3401L, 3401L, 3401L, 3401L, 3401L, 
3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 
3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L, 3402L
), district_name = c("khunti", "khunti", "khunti", "khunti", 
"khunti", "khunti", "khunti", "khunti", "ranchi", "ranchi", "ranchi", 
"ranchi", "ranchi", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga"), block_code = c(3401020L, 
3401020L, 3401020L, 3401020L, 3401020L, 3401020L, 3401020L, 3401020L, 
3401024L, 3401024L, 3401024L, 3401024L, 3401024L, 3402001L, 3402001L, 
3402001L, 3402001L, 3402001L, 3402001L, 3402001L, 3402001L, 3402001L, 
3402001L, 3402001L, 3402006L, 3402001L, 3402007L, 3402007L, 3402007L, 
3402002L, 3402002L), block_name = c("torpa", "torpa", "torpa", 
"torpa", "torpa", "torpa", "torpa", "torpa", "khelari", "khelari", 
"khelari", "khelari", "khelari", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "lohardaga", "lohardaga", 
"lohardaga", "lohardaga", "lohardaga", "kairo", "lohardaga", 
"peshrar", "peshrar", "peshrar", "kisko", "kisko"), village_code = c(3401020009, 
3401020010, 3401020011, 3401020012, 3401020013, 3401020014, 3401020015, 
3401020016, 3401024001, 3401024002, 3401024003, 3401024004, 3401024005, 
3402001001, 3402001002, 3402001003, 3402001004, 3402001005, 3402001006, 
3402001007, 3402001008, 3402001009, 3402001010, 3402001011, 3402001012, 
3402001013, 3402002001, 3402002002, 3402002003, 3402002004, 3402002005
), village_name = c("marcha", "okra", "sundari", "tapkara", 
"torpa east", "torpa west", "ukrimari", "urikela", "churi east", 
"churi middle", "churi north", "churi south", "churi west", "hesal", 
"hirhi", "manho", "jori", "nigni", "juriya", "harmu", "rampur", 
"bagha", "arkosa", "tigra", "guri", "bhatdhijri", "siram", "peshrar", 
"rorad", "devdaria", "pakhar")), row.names = 379:409, class = "data.frame")

Example of data set after using the code:使用代码后的数据集示例:

3404L, 3405L, 3406L, 3407L, 3408L, 3409L, 3410L, 3411L), district_name = c("khunti", 
"ranchi", "lohardaga", "gumla", "simdega", "palamu", "latehar", 
"garhwa", "west singhbhum", "saraikela kharsawan", "east singhbum", 
"dumka"), block_code = c(3401009L, 3401013L, 3402005L, 3403009L, 
3404002L, 3405018L, 3406006L, 3407009L, 3408005L, 3409006L, 3410005L, 
3411009L), block_name = c("khunti", "namkum", "bhandra", "basia", 
"bolba", "tarhasi", "garu", "bhandaria", "tantnagar", "ichagarh", 
"musabani", "masaliya"), village_code = c(3401009002, 3401013020, 
3402005002, 3403009008, 3404002002, 3405006012, 3406006008, 3407009002, 
3408005001, 3409006012, 3410005011, 3411009001), village_name = c("bhandra", 
"sithiyo", "bhandra", "mamarla", "kadopani", "manjhauli 2", "ghasitola", 
"bhandaria", "angardiha", "dewaltand", "ichra (north)", "aamgachi"
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
))

Your example is not really suited to provide a solution since it only contains one district/block.您的示例并不真正适合提供解决方案,因为它只包含一个区/块。

But you can do:但你可以这样做:

df %>%
  group_by(district_code) %>%
  filter(block_code == ifelse(length(unique(block_code)) == 1, block_code, sample(unique(block_code), size = 1))) %>%
  filter(village_code %in% ifelse(length(unique(village_code)) == 1, village_code, sample(unique(village_code), size = min(10, length(unique(village_code))), replace = FALSE))) %>%
  ungroup()

Note: I wasn't entirely sure at which level you want to sample, so here I select one block per district and then 10 villages from that block.注意:我不完全确定你想在哪个级别采样,所以在这里我 select 每个区一个街区,然后是那个街区的 10 个村庄。 So you will end up with 10 villages from a randomly selected block for each district.因此,您最终将从每个地区随机选择的街区中获得 10 个村庄。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM