简体   繁体   中英

emacs syntax highlight numbers not part of words (with regex?)

I've moved to emacs recently and I am used to/like numbers being highlighted. A quick hack I took from here puts the following in my .emacs :

(add-hook 'after-change-major-mode-hook
      '(lambda () (font-lock-add-keywords 
                   nil 
                   '(("\\([0-9]+\\)" 
                      1 font-lock-warning-face prepend)))))

Which gives a good start, ie any digit is highlighted. However, I am a complete beginner with regex and would ideally like the following behaviour:

  • Also highlight the decimal point if it's part of a float, eg 12.34
  • Do not highlight any part of the number if it is next/part of a word. eg in these cases: foo11 ba11r 11spam, none of the '1's should be highlighted
  • Allow 'e' within two number integers to allow scientific notation (not required, bonus credit)

Unfortunately this looks very much like a 'do this for me' question which I am loathe to post, but I have failed thus far to make any decent progress myself.

About as far as I have got is discovering [^a-zA-Z][0-9]+[^a-zA-Z] to match anything but a letter either side (eg an equals sign), but all this does is include the adjacent symbol in the highlighting. I am not sure how to tell it 'only highlight the numbers if there isn't a letter on either side'.

Of course, I can't imagine regex is the way to go with complicated syntax highlighting, so any good number highlighting in emacs ideas are also welcome,

Any help very much appreciated. (In case it makes any difference, this is for use when Python coding.)

Start by going to your scratch buffers and typing in a some test text. put some numbers in there, some identifiers that contain numbers, some numbers with missing parts (like .e12 ), etc. These will be our testcases and will let us experiment rapidly. Now run Mx re-builder to enter the regex builder mode, which will let you try out any regex against the text of the current buffer to see what it matches. This is a very handy mode; you'll be able to use it all the time. Just note that because Emacs lisp requires you to put regexes into strings, you must double up on all of your backslashes. You're already doing that correctly, but I'm not going to double them up in here.

So, limiting the match to numbers that are not part of identifiers is pretty easy. \\b will match word boundaries, so putting one at either end of your regex will make it match a whole word

You can match floats just by adding a period to the character class you started with, so that it becomes [0-9.] . Unfortunately, that can match a period all on it's own; what we really want is [0-9]*\\.?[0-9]+ , which will match 0 or more digits followed by an optional period followed by one or more digits.

A leading sign can be matched with [-+]? , so that gets us negative numbers.

To match exponents we need an optional group: \\(...\\)? , and since we are only using this for highlighting, and don't actually need to separate out the content of the group, we can do \\(?:...\\) , which will save the regex matcher a little time. Inside the group we will need to match an "e" ( [eE] ), an optional sign ( [-+]? ), and one or more digits ( [0-9]+ ).

Putting it all together: [-+]?\\b[0-9]*\\.?[0-9]+\\(?:[eE][-+]?[0-9]+\\)?\\b . Note that I've put the optional sign before the first word boundary, because the "+" and "-" characters create a word boundary.

First of all, lose the add-hook and the lambda . The font-lock-add-keywords call doesn't need either. If you want this only for python-mode , pass the mode symbol as the first argument instead of nil .

Second, there are two main ways to do that.

  1. Add a grouping construct around the digits. The numbers in the font-lock-keywords forms correspond to the groups, so this would be '(("\\\\([^a-zA-Z]\\\\([0-9]+\\\\)[^a-zA-Z]\\\\)" 2 font-lock-warning-face prepend) . The outer grouping is rather useless here, though, so this can be simplified to '(("[^a-zA-Z]\\\\([0-9]+\\\\)[^a-zA-Z]" 1 font-lock-warning-face prepend) .

  2. Just use the beginning and end of symbol backslash constructs. Then the regexp looks like this: \\_<[0-9]+\\_> . We can highlight the whole match here, so there's no need for the group number: '(("\\\\_<[0-9]+\\\\_>" . font-lock-warning-face prepend) . As a variation, you could use the beginning-of-word and end-of-word constructs, but you probably don't want to highlight numbers adjacent to underscores or whatever other characters, if any, python-mode has in the syntax class symbol .

And lastly, there's probably no need for prepend . The numbers are likely all unhighlighted before this, and if you consider possible interaction with other minor modes like whitespace , you'd better choose append , or just omit this element entirely.

End result:

(font-lock-add-keywords nil '(("\\_<[0-9]+\\_>" . font-lock-warning-face)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM