简体   繁体   English

你如何生成长串数字?

[英]How do you generate long strings of numbers?

I want to generate some strings of numbers with lots of digits, in this case for ID values in a synthetic dataset. 我想生成一些包含大量数字的数字字符串,在本例中是合成数据集中的ID值。

For short strings of numbers, I'd use sample : 对于短串数字,我会使用sample

sprintf("%05.f", sample(0:(1e5-1), 18))
##  [1] "54783" "80354" "53607" "99668" "63621" "07121" "15944" "27436" "96837"
## [10] "28751" "95315" "63326" "00981" "15300" "18448" "09885" "63360" "04539"

This doesn't work for longer strings. 这不适用于较长的字符串。 First the memory requirements get too larger, then you can't make numbers big enough. 首先,内存要求变得太大,然后你不能使数字足够大。 For example, this doesn't work: 例如,这不起作用:

sprintf("%020.f", sample(0:(1e20-1), 18))
## Error in 0:(1e+20 - 1) : result would be too long a vector

How do I make strings of numbers with lots of digits? 如何使用大量数字创建数字串?

You can use the stringi package: 你可以使用stringi包:

 require(stringi)
 stri_rand_strings(10,50,pattern="[0-9]")
 #[1] "33163217620361477538822791082750025522246331345665"
 #[2] "85105858270154002408385176647161448078668054193081"
 #[3] "62417899981033664011261714060242781925235001978704"
 #[4] "17731152361720663463691231461493607438220463345863"
 #[5] "06316044683426574113640145569673845269595104465896"
 #[6] "17058300286927387520323781399768150137786864069558"
 #[7] "86204984977415277470013113957915963393339586096213"
 #[8] "56382530391794208466245591896055134584746907393458"
 #[9] "61740570216902905237145952608961548203505061535222"
 #[10] "28713530448562268345804947527043822080897315821103"

The first argument is the length of the resulting vector, the second is the number of characters of each string and with the third we say that we need just numbers. 第一个参数是结果向量的长度,第二个是每个字符串的字符数,第三个参数我们说我们只需要数字。

Sticking with base R, one could try to generate 1000 strings with 50 numbers each: 坚持使用base R,可以尝试生成1000个字符串,每个字符串包含50个数字:

apply(matrix(sample(charToRaw("0123456789"),50*1000,replace=TRUE),nrow=1000),1,‌​rawToChar)

A base R alternative: 基础R替代方案:

set.seed(123)
paste0(sample(0:9,50,replace=TRUE),collapse="")
#[1] "27489058549465182039866967552199670472321443112428"

EDIT: As suggested by @docendodiscimus this can be combined with replicate() to obtain an arbitrary number of such strings: 编辑:正如@docendodiscimus所建议的,这可以与replicate()结合使用以获得任意数量的此类字符串:

replicate(10,paste0(sample(0:9,50,replace=TRUE),collapse=""))
# [1] "27489058549465182039866967552199670472321443112428" "04715217836032848874767042363126471498811636317045"
# [3] "53494896419309715954633239101668675687943401822027" "84321352425363357242618766358583725425992396944615"
# [5] "29654832114226073489297603456964502318185616373997" "22525714489869553305800177940671320302062108789107"
# [7] "70776410443470388238821710903962783466694152439326" "19516964381183371044438459723957375912029277122119"
# [9] "91953470363824219340565386331895392614012571877136" "53202887119441522628084764602728369116489047092067"

And the obligatory competition: 强制性竞争:

GNS <- function(nNumbers, nCharsPerNumber)
{
  sample(0:9, nNumbers * nCharsPerNumber, replace = TRUE) %>%
    split(gl(nNumbers, nCharsPerNumber)) %>% 
    vapply(paste0, character(1), collapse = "", USE.NAMES = FALSE)
}


GNP <- function(nNumbers,nCharsPerNumber){

replicate(nNumbers,paste0(sample(0:9,nCharsPerNumber,replace=TRUE),collapse=""))
}

GST <- function(nNumbers,nCharsPerNumber){
stri_rand_strings(nNumbers,nCharsPerNumber,pattern="[0-9]")
}


microbenchmark(GNS(1000,100),GNP(1000,100),GST(1000,100),10)

And the scores.... 而得分......

Unit: milliseconds
           expr       min        lq     mean    median        uq       max
 GNS(1000, 100) 36.832684 38.918858 40.90260 40.750332 41.374165 46.369622
 GNP(1000, 100) 36.808395 39.310571 39.99557 40.094511 40.772055 44.025157
 GST(1000, 100)  1.882961  1.923672  2.03537  1.983199  2.166911  2.325648
 neval
    10
    10
    10

We have a clear winner! 我们有一个明显的赢家!

EDIT: adding another base option, and it's even faster. 编辑:添加另一个基本选项,它甚至更快。

GSAP<- function(nNumbers,nCharsPerNumber){
apply( matrix(sample(charToRaw("0123456789"),nNumbers*nCharsPerNumber,replace=TRUE),nrow=nCharsPerNumber),1, rawToChar )  }
Unit: microseconds
            expr       min        lq      mean     median       uq       max
 GSAP(1000, 100)   724.584   739.637   821.435   766.8345   899.06  1030.086
  GNS(1000, 100) 36189.180 38316.406 39739.471 39141.5695 39965.02 44478.450
  GNP(1000, 100) 35777.282 36331.839 38448.665 38575.8945 39725.21 43016.281
  GST(1000, 100)  1863.803  1898.013  1944.472  1918.7110  1975.33  2122.094

EDIT number two: try bigger inputs.. and get the code right this time 编辑第二:尝试更大的输入.. 并且这次获得正确的代码

(time in seconds) (以秒为单位的时间)

     expr       min        lq      mean    median        uq       max neval
 GSAP(x, y)  3.906626  3.975160  4.069103  4.049784  4.163262  4.329284    10
  GNS(x, y) 33.645200 33.972587 34.513555 34.406009 35.141313 35.328662    10
  GNP(x, y) 30.833180 31.136971 33.037422 32.193070 33.010896 41.713811    10
  GST(x, y)  1.697303  1.706599  1.731205  1.735127  1.756961  1.763861    10

So GST wins by a small margin. 因此,GST小幅上涨。

Generate individual digits, parcel them out between individual numbers, then collapse the digits together. 生成单个数字,将它们分散在各个数字之间,然后将数字折叠在一起。

library(magrittr)
generateNumberStrings <- function(nNumbers, nCharsPerNumber)
{
  sample(0:9, nNumbers * nCharsPerNumber, replace = TRUE) %>%
    split(gl(nNumbers, nCharsPerNumber)) %>% 
    vapply(paste0, character(1), collapse = "", USE.NAMES = FALSE)
}

generateNumberStrings(18, 20)
##  [1] "06985095513359117867" "95278964413245221928" "75398392571928201881"
##  [4] "00722065797044523279" "24475619649735183646" "29165493966488037145"
##  [7] "34289922968745727406" "82354362380114534171" "84293845597888728670"
## [10] "97570546918892201649" "41421884356741221760" "99306177663904189401"
## [13] "25668966612346726451" "94949806854834288664" "43664073601604613019"
## [16] "25848242347176214032" "80736828777283687373" "83763855757083999312"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM