[英]how to generate random postcode in r
我需要幫助生成一個 r 代碼,該代碼在樣本大小為 5000 的 csv 文件中分配隨機郵政編碼,文件樣本如下所示。 2007、2008、2009等等都是年份
ID 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
我有一個單獨的文件,其中保存了所有郵政編碼。 下面復制的郵政編碼文件示例
BR1 1AA
BR1 1AB
BR1 1AD
BR1 1AE
BR1 1AF
BR1 1AG
BR1 1AH
BR1 1AJ
BR1 1AL
BR1 1AX
BR1 1BA
BR1 1BB
BR1 1BP
BR1 1BQ
BR1 1BS
BR1 1BT
BR1 1BU
BR1 1BW
BR1 1BX
BR1 1BY
BR1 1BZ
BR1 1DA
BR1 1DB
BR1 1DD
BR1 1DE
BR1 1DF
BR1 1DG
BR1 1DH
BR1 1DJ
BR1 1DL
BR1 1DN
BR1 1DP
BR1 1DQ
BR1 1DR
BR1 1DS
BR1 1DT
BR1 1DU
BR1 1DW
BR1 1DX
BR1 1EA
BR1 1EE
BR1 1EG
BR1 1EH
BR1 1EJ
BR1 1EL
BR1 1EN
BR1 1EP
BR1 1ER
BR1 1ES
BR1 1EU
BR1 1EW
BR1 1EX
BR1 1EY
BR1 1EZ
BR1 1GA
BR1 1HA
BR1 1HB
BR1 1HD
BR1 1HE
BR1 1HF
BR1 1HG
BR1 1HH
BR1 1HJ
BR1 1HL
BR1 1HN
BR1 1HP
BR1 1HQ
BR1 1HR
BR1 1HS
BR1 1HT
BR1 1HU
BR1 1HW
BR1 1HX
BR1 1HY
BR1 1HZ
BR1 1JA
BR1 1JB
BR1 1JD
BR1 1JF
BR1 1JG
BR1 1JH
BR1 1JJ
BR1 1JL
BR1 1JN
BR1 1JP
BR1 1JQ
BR1 1JR
BR1 1JS
BR1 1JT
BR1 1JU
BR1 1JW
BR1 1JX
BR1 1JY
BR1 1LA
BR1 1LB
BR1 1LD
BR1 1LE
BR1 1LF
BR1 1LG
我希望按照以下方式在數據表中分配郵政編碼。2007 年至 2017 年期間居住的郵政編碼數量
% | n |
---|---|
39.7 | 1985年 |
32.3 | 1615 |
15.2 | 760 |
6.6 | 330 |
3.6 | 180 |
1.9 | 95 |
0.6 | 30 |
0.2 | 10 |
在數據表中有 5000 個 ID,我必須填寫 2007 年至 2017 年的郵政編碼。1985 年的記錄在 2007 年至 2017 年期間應具有相同的郵政編碼,但彼此不同。
在第二步程序中,選擇 1615 個郵政編碼並將其分配給 1615 條記錄,以便在 2007 年和 2017 年期間郵政編碼發生一次變化(因此他們在學習期間生活在兩個郵政編碼上。依此類推。
像這樣定義向量中的輸入
years <- 2007:2017
target_frequencies <- c(1985L, 1615L, 760L, 330L, 180L, 95L, 30L, 10L)
postcodes <- c("1AA", "1AB", "1AD", "1AE", "1AF", "1AG", "1AH", "1AJ", "1AL", "1AX", "1BA", "1BB", "1BP", "1BQ", "1BS", "1BT", "1BU", "1BW", "1BX", "1BY", "1BZ", "1DA", "1DB", "1DD", "1DE", "1DF", "1DG", "1DH", "1DJ", "1DL", "1DN", "1DP", "1DQ", "1DR", "1DS", "1DT", "1DU", "1DW", "1DX", "1EA", "1EE", "1EG", "1EH", "1EJ", "1EL", "1EN", "1EP", "1ER", "1ES", "1EU", "1EW", "1EX", "1EY", "1EZ", "1GA", "1HA", "1HB", "1HD", "1HE", "1HF", "1HG", "1HH", "1HJ", "1HL", "1HN", "1HP", "1HQ", "1HR", "1HS", "1HT", "1HU", "1HW", "1HX", "1HY", "1HZ", "1JA", "1JB", "1JD", "1JF", "1JG", "1JH", "1JJ", "1JL", "1JN", "1JP", "1JQ", "1JR", "1JS", "1JT", "1JU", "1JW", "1JX", "1JY", "1LA", "1LB", "1LD", "1LE", "1LF", "1LG")
我會用purrr
來解決這個purrr
:
library(purrr)
我們可以定義一個輔助函數來生成隨機郵政編碼,以唯一郵政編碼的數量為參數:
generate_postcodes <- function(count) {
years_with_new_code <- sort(sample(tail(years, -1), count - 1))
sample(postcodes)[findInterval(years, years_with_new_code) + 1] %>%
set_names(years)
}
測試輔助函數
generate_postcodes(2)
# 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
# "1HW" "1HW" "1HW" "1HW" "1HW" "1HW" "1HW" "1HW" "1EX" "1EX" "1EX"
generate_postcodes(6)
# 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
# "1JD" "1JD" "1JU" "1HQ" "1HQ" "1HQ" "1JA" "1GA" "1GA" "1EE" "1EE"
最后,我們可以調用
imap_dfr(target_frequencies, function(count, code_count) {
map(seq(count), ~ generate_postcodes(code_count))
}) %>%
.[sample(nrow(.)), ]
返回具有所需屬性的隨機排序的 tibble:
# A tibble: 5,005 x 11
`2007` `2008` `2009` `2010` `2011` `2012` `2013` `2014` `2015` `2016` `2017`
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1HW 1EZ 1EZ 1EZ 1EZ 1EZ 1EZ 1EZ 1EZ 1EZ 1EZ
2 1LG 1DG 1DU 1HZ 1HZ 1HZ 1HZ 1HZ 1HZ 1HZ 1HZ
3 1HD 1HD 1HD 1HD 1HD 1HD 1HD 1HD 1HD 1HD 1HD
4 1HF 1HF 1HF 1HF 1HF 1HF 1HF 1HF 1HF 1HF 1GA
5 1JU 1JU 1BP 1BP 1BP 1BP 1BP 1BP 1BP 1BP 1DP
6 1EG 1ER 1ER 1ER 1ER 1ER 1ER 1ER 1ER 1ER 1ER
7 1EL 1EL 1EL 1EL 1EL 1EL 1EL 1EL 1EL 1EL 1EL
8 1DN 1DN 1DN 1DN 1JG 1JG 1JG 1JG 1JG 1JG 1JG
9 1HG 1HG 1HG 1HG 1HG 1HG 1HG 1BQ 1BQ 1BQ 1BQ
10 1ER 1ER 1ER 1ER 1ER 1DW 1DW 1DW 1DW 1JQ 1JQ
# … with 4,995 more rows
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.