简体   繁体   中英

How does this detab function in Lua work

-- Converts tabs to spaces
function detab(text)
    local tab_width = 4
    local function rep(match)
        local spaces = -match:len()
        print("match:"..match)
        while spaces<1 do spaces = spaces + tab_width end
        print("Found "..spaces.." spaces")
        return match .. string.rep(" ", spaces)
    end
    text = text:gsub("([^\n]-)\t", rep)
    return text
end


str='   thisisa string'
--thiis is a      string

print("length: "..str:len())
print(detab(str))
print(str:gsub("\t","    "))

I have this piece of code from markdown.lua that converts tabs to spaces(as its name suggests).
What I have managed to figured out is that it searches from the beginning of the string until it finds a tab and passes the matched substring to the 'rep' function. It does this repeatedly until there are no more matches.
My problem is in trying to figure out what the rep function is doing especially in the while loop.
Why does the loop stop at 1 ?
Why does it count up? .
Suprisingly, it counts the number of spaces in the string, how exactly is a mystery.
If you compare its output with the output from the last gsub replacement you'll find that they are different.
Detab maintains the alignment of the characters while the gsub replacement doesn't. Why is that so?
Bonus question. When I switch on whitespace in Scite, I can see that the tab before the 't' is longer than the tab before the third 's' . Why are they different?

From analyzing the rep function, this is what it appears to be doing. First, it takes the length of the match string passed in and make it negative (eg like multiplying it by -1). In the while loop it keeps adding to space until it becomes positive.

It might be easier to visualize this using a number line:

<--|----|-------|----|----|----|----|----|----|----|----|--->
  -n      -spaces             -2   -1    0    1    2    n

In essence, the loop is trying to figure how many "tab_widths" can fit into spaces before it "overflows". Here it's using the transition from 0 to 1 as the cutoff point. After the loop, spaces will have how much it overflowed by.

In fact, the while loop is mimicking a mathematical operation you might know as modulo. In other words the inner rep function can be rewritten as this:

local function rep(match)
  local spaces = tab_width - match:len() % tab_width

  return match .. string.rep(" ", spaces)
end

This differs from the outter str:gsub("\\t", " ") where that one indiscriminately substitutes all tab characters with 4 spaces. OTOH, in detab function, the number of spaces that replaces the tab character depends on the length of the matching capture.

eg.
matching length is 1, replace tab with 3 spaces
matching length is 2, replace tab with 2 spaces
matching length is 3, replace tab with 1 space
matching length is n, replace tab with tab_width - (n % tab_width) spaces
etc.

To answer the bonus question: Tab characters align to tabstops. A tabstop is eight characters. The first tab starts on column six so it needs to pad three spaces. The second tab starts on column 16 so it only needs to be one space wide.

The loop stops when spaces becomes a positive number because the loop has been adding spaces in 'indent' increments until it has enough spaces to be longer than the matched text. When it then combines that number of spaces with the matched text it has constructed a string which is padded to the correct tabstop.

That's also why the gsub differs. The gsub isn't treating tabs as tabstop characters but rather as four spaces. So the second tab doesn't pad to the tabstop but instead expands to four spaces.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM