简体   繁体   中英

How can I implement Markov's algorithm with variables and markers?

I've been trying to implement Markov's algorithm, but I've only had partial success. The algorithm is fairly simple and can be found here .

However, my project has an added difficulty, I have to use rules that include markers and variables.

A variable represents any letter in the alphabet and a marker is simply a character that is used as a reference to move the variables around (It doesn't have a real value).

This example duplicates every character in a string:

Alphabet: {a,b,c}

Markers: {M}

Variables: {x}

Rule 1: Mx -> xxM

Rule 2: xM -> x

Rule 3: x -> Mx

input: abc

abc //We apply rule 3

Mabc //We apply rule 1

aaMbc //We apply rule 1

aabbMc //We apply rule 1

aabbccM //We apply rule 2

aabbcc

This is my recursive function that implements a markov algorithm that only works with string inputs for example: Rule 1: "apple" -> "orange", Input: "apple".

public static String markov(String input, LinkedList<Rule> rules) {
    for (Rule rule : rules) {
        if (!input.equals(input.replace(rule.getFrom(), rule.getTo()))) { //If the rule matches a substring
            if (rule.isTerminating()) { //If the rule is terminating
                input = input.replaceFirst(Pattern.quote(rule.getFrom()), rule.getTo());
                System.out.println(input); //Replace the first instance
                return input; //return and end the cycle
            } else {
                input = input.replaceFirst(Pattern.quote(rule.getFrom()), rule.getTo());
                System.out.println(input);
                return markov(input, rules); //Start looking again for matching rules
            }
        }
    }
    return input;
}

I can't figure out how to implement variables and markers into this logic so perhaps someone can educate me on the best way to implement this logic? any advice is welcome.

If the question doesn't comply with SO guidelines please let me know why in the comments so I don't repeat the mistake.

Thank You!

GitHub

I think the easiest way to do this is using Java regular expressions. Once you get your head around those, then the following rules should work for your example:

Rule 1: "M([a-c])" -> "$1$1M"
Rule 2: "([a-c])M" -> "$1" (terminating)
Rule 3: "([a-c])"  -> "M$1"

Note that you need a couple of tweaks to your current method to make this work...

replace takes a literal string as it's first parameter whereas replaceFirst uses a regex, so:

replace: if (!input.equals(input.replace(rule.getFrom(), rule.getTo()))) {
with:    if (!input.equals(input.replaceFirst(rule.getFrom(), rule.getTo()))) {

You are quoting the rule.getFrom() string, which will not work with regular expressions, so:

replace: input = input.replaceFirst(Pattern.quote(rule.getFrom()), rule.getTo());
with:    input = input.replaceFirst(rule.getFrom(), rule.getTo());

At that point, you have a bit of duplication in the code calling replaceFirst twice, so you could stick that in a temp variable the first time and reuse it:

String next = input.replace(rule.getFrom(), rule.getTo());
if (!input.equals(next)) {
  ...
  input = next;
  ...
}

As you are currently quoting the entire rule.getFrom() string I'm guessing you have had problems with regular expression special characters in this before. If so, you'll need to address them individually when creating the rules. I really don't want to get into regular expressions here as it is a huge area and is completely separate to the Markov algorithm, so if you are having problems with these then please do some research online (eg Regular Expressions and Capturing Groups ), or ask a separate question here focusing on the regular expression specific problem.

Note that you can still combine these with the normal rules so (changing the marker character from M to # to allow M to be used in the alphabet), these rules:

"A"             -> "apple"
"B"             -> "bag"
"S"             -> "shop"
"T"             -> "the"
"the shop"      -> "my brother"
"#([a-zA-Z .])" -> "$1$1#"
"([a-zA-Z .])#" -> "$1" (terminating)
"([a-zA-Z .])"  -> "#$1"

Would convert:

from: I bought a B of As from T S.
to:   II  bboouugghhtt  aa  bbaagg  ooff  aapppplleess  ffrroomm  mmyy  bbrrootthheerr..

Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM