简体   繁体   English

通过给定的正则表达式生成字符串

[英]Generate strings by given regex

How can I generate string by given regex in ruby? 如何在ruby中通过给定的正则表达式生成字符串?

Class MyRegex
   def self.generate_string(reg)
    #return generated string.
   end
end

When I call 当我打电话

MyRegex.generate_string(/a*/) #it will return random string.

expecting output: 预期的输出:

aaa
aa
aaaaa

and so on 等等

A bit late to the party, but I have created a powerful ruby gem which solves the original problem: 晚会晚了一点,但是我创造了一个功能强大的红宝石宝石,可以解决最初的问题:

https://github.com/tom-lord/regexp-examples https://github.com/tom-lord/regexp-examples

/this|is|awesome/.examples #=> ['this', 'is', 'awesome']
/https?:\/\/(www\.)?github\.com/.examples #=> ['http://github.com', 'http://www.github.com', 'https://github.com', 'https://www.github.com']

The short answer is that you can't, as some of the strings could be infinite if you allow the * , + or repetitions which are open on the right eg. 简短的答案是您不能这样做,因为如果允许*+或在右边打开的重复项,则某些字符串可能是无限的。 {4,} . {4,}

If you want to do it any way, then you have two strategies, both of which starts with parsing the regex, and building a state machine representing it. 如果您想以任何方式执行此操作,则有两种策略,这两种策略均始于解析正则表达式,并构建一个表示该正则表达式的状态机。

Then you can either generate a random run through it of max length 'n'. 然后,您可以生成最大长度为'n'的随机游程。 This will give you a random string of at most length n . 这将为您提供最大长度为n的随机字符串。 Or you can add an empty transition to all the states in your state machine to a terminal state, and simply do a random walk until you hit a terminal state. 或者,您可以将状态机中的所有状态添加到终端状态的空过渡,然后简单地随机游走,直到达到终端状态。 This will give you a completely random string, which the regex accepts, where the length has an arbitrary length, but longer strings are less probable. 这将为您提供一个完全随机的字符串,正则表达式会接受该字符串,其中长度具有任意长度,但是较长的字符串不太可能出现。 But please not that there is still an, albeit very very small, chance that this method will never end, as the string size grows, then the probability of outputting a new character falls, but just as the string length never hits infinite, neither does the probability of a new character hit zero. 但是请不要因为字符串的长度变大,仍然有这种方法永远不会结束的机会,尽管很小,但是输出新字符的可能性会下降,但是就像字符串的长度永远不会达到无限一样,也不会一个新角色的概率为零。

That is almost exactly what the code posted in the comments by @neil-slater https://github.com/repeatedly/ruby-string-random 这几乎就是@ neil-slater https://github.com/repeatedly/ruby-string-random在评论中发布的代码

Edit 编辑

The OP asks if it is possible to generate a random string, which a given regular expression matches. OP询问是否可以生成给定正则表达式匹配的随机字符串。

A regular expression is a string representation of a regular language. 正则表达式是常规语言的字符串表示形式。 A finite automaton is a decider which encodes a given regular language, and can determine if a given string is part of that regular language. 有限自动机是对给定的常规语言进行编码的决策者,并且可以确定给定的字符串是否属于该常规语言。 So basically the way regular expression matching works, is by compiling the regular expression to a finite automaton, and use that to see if it accepts the string. 因此,基本上,正则表达式匹配的工作方式是将正则表达式编译为有限的自动机,并使用该自动机来查看其是否接受字符串。 That's matching. 这是匹配的。

Now lets look at generation. 现在让我们看一下生成。 You can use the same finite automaton to generate strings, however as a finite automata, as @sawa correctly pointed out, only works on finite strings, then you have to make sure that you only generate a finite string. 您可以使用相同的有限自动机来生成字符串,但是作为有限自动机,正如@sawa正确指出的那样,它仅对有限字符串有效,然后必须确保仅生成有限字符串。 One way of doing this is randomly deciding a maximum length, and then do a random walk of at most that length in the fintite automaton. 一种方法是随机确定最大长度,然后在有限自动机中随机进行最大长度的随机游动。 One way of not doing this is the way both @sawa and I suggested of taking a transition with some probability, or simply stopping. 一种不这样做的方法是@sawa和我建​​议以某种可能性进行转换,或者只是停止。 As this potentially doesn't terminate, because the product of any non-zero probabilities, only approaches zero, but new reaches it. 因为这可能不会终止,因为任何非零概率的乘积只会接近零,而新的会达到零。

This answer is not intended to fully answer you question. 此答案无意完全回答您的问题。 Its purpose is twofold: (i) to show that it is not impossible, so that jbr's answer is entirely wrong, and (ii) to suggest you that, nevertheless, it is not trivial, and you have to work out the complete code by yourself. 它的目的是双重的:(i)证明这不是不可能的,因此jbr的答案是完全错误的;以及(ii)建议您尽管如此,它也不是小事,您必须通过以下步骤来编写完整的代码你自己。

Since to fully answer your question is would probably not fit the space for a single answer in a Q and A site like this, I will show a code that generates all the possible strings that matches a fixed regex: 由于要完全回答您的问题可能不太适合这样的问答网站中的单个答案,因此,我将显示一个代码,该代码生成与固定正则表达式匹配的所有可能字符串:

/a*/

The code is like this: 代码是这样的:

class Regexp
  def self.generate_string
    rand > 0.5 ? "" : "a#{generate_string}"
  end
end

Each time you run Regexp.generate_string , a random string that matches /a*/ will be generated. 每次运行Regexp.generate_string ,都会生成/a*//a*/匹配的随机字符串。 The string would be of an arbitrary length, and the longer the string is, it will be generated with less possibility. 该字符串将具有任意长度,并且字符串越长,生成的可能性就越小。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM