简体   繁体   中英

Match repeatable string as a “whole word” in Lua 5.1

My environment:

  • Lua 5.1
  • Absolutely no libraries with a native component (like a C .so/.dll) can be used
  • I can run any arbitrary pure Lua 5.1 code, but I can't access os and several other packages that would allow access to the native filesystem, shell commands or anything like that, so all functionality must be implemented in Lua itself (only).
  • I've already managed to pull in LuLpeg . I can probably pull in other pure Lua libraries.

I need to write a function that returns true if the input string matches an arbitrary sequence of letters and numbers as a whole word that repeats one or more times, and may have punctuation at the beginning or end of the entire matching substring. I use "whole word" in the same sense as the PCRE word boundary \\b .

To demonstrate the idea, here's an incorrect attempt using the re module of LuLpeg; it seems to work with negative lookaheads but not negative look behinds :

function containsRepeatingWholeWord(input, word)
    return re.match(input:gsub('[%a%p]+', ' %1 '), '%s*[^%s]^0{"' .. word .. '"}+[^%s]^0%s*') ~= nil
end

Here are example strings and the expected return value (the quotes are syntactical as if typed into the Lua interpreter, not literal parts of the string; this is done to make trailing/leading spaces obvious):

  • input: " one !tvtvtv! two" , word: tv , return value: true
  • input: "I'd" , word: d , return value: false
  • input: "tv" , word: tv , return value: true
  • input: " tvtv! " , word: tv , return value: true
  • input: " epon " , word: nope , return value: false
  • input: " eponnope " , word: nope , return value: false
  • input: "atv" , word: tv , return value: false

If I had a full PCRE regex library I could do this quickly, but I don't because I can't link to C, and I haven't found any pure Lua implementations of PCRE or similar.

I'm not certain if LPEG is flexible enough (using LPEG directly or through its re module) to do what I want, but I'm pretty sure the built-in Lua functions can't do what I want, because it can't handle repeating sequences of characters. (tv)+ does not work with Lua's builtin string:match function and similar.

Interesting resources I've been scouring to try to figure out how to do this, to no avail:

I think the pattern doesn't work reliably because the %s*[^%s]^0 part matches an optional series of spacing characters followed by non-spacing characters, and then it tries to match the reduplicated word and fails. After that, it doesn't go backwards or forwards in the string and try to match the reduplicated word at another position. The semantics of LPeg and re are very different from those of most regular expression engines, even for things that look similar.

Here's a re -based version. The pattern has a single capture (the reduplicated word), so if the reduplicated word was found, matching returns a string rather than a number.

function f(str, word)
    local patt = re.compile([[
        match_global <- repeated / ( [%s%p] repeated / . )+
        repeated <- { %word+ } (&[%s%p] / !.) ]],
        { word = word })
    return type(patt:match(str)) == 'string'
end

It is somewhat complex because the vanilla re does not have a way to generate a lpeg.B pattern.

Here's a lpeg version using lpeg.B . LuLPeg also works here.

local lpeg = require 'lpeg'
lpeg.locale(lpeg)

local function is_at_beginning(_, pos)
    return pos == 1
end

function find_reduplicated_word(str, word)
    local type, _ENV = type, math
    local B, C, Cmt, P, V = lpeg.B, lpeg.C, lpeg.Cmt, lpeg.P, lpeg.V
    local non_word = lpeg.space + lpeg.punct
    local patt = P {
        (V 'repeated' + 1)^1,
        repeated = (B(non_word) + Cmt(true, is_at_beginning))
                * C(P(word)^1)
                * #(non_word + P(-1))
    }
    return type(patt:match(str)) == 'string'
end

for _, test in ipairs {
    { 'tvtv', true },
    { ' tvtv', true },
    { ' !tv', true },
    { 'atv', false },
    { 'tva', false },
    { 'gun tv', true },
    { '!tv', true },
} do
    local str, expected = table.unpack(test)
    local result = find_reduplicated_word(str, 'tv')
    if result ~= expected then
        print(result)
        print(('"%s" should%s match but did%s')
            :format(str, expected and "" or "n't", expected and "n't" or ""))
    end
end

Lua patterns are powerful enough.
No LPEG is needed here.

This is your function

function f(input, word)
   return (" "..input:gsub(word:gsub("%%", "%%%%"), "\0").." "):find"%s%p*%z+%p*%s" ~= nil
end

This is a test of the function

for _, t in ipairs{
   {input = " one !tvtvtv! two", word = "tv", return_value = true},
   {input = "I'd", word = "d", return_value = false},
   {input = "tv", word = "tv", return_value = true},
   {input = "   tvtv!  ", word = "tv", return_value = true},
   {input = " epon ", word = "nope", return_value = false},
   {input = " eponnope ", word = "nope", return_value = false},
   {input = "atv", word = "tv", return_value = false},
} do
   assert(f(t.input, t.word) == t.return_value)
end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM