Syntax highlighting in JEditorPane in java

Question

I want to perform syntax highlighting in jEditorPane. It allows me to perform single line syntax highlighting but if the XML tag is split into two or more line it does not work. Below is the code i am using for Syntax highlighting. Help me out with this. Thanks.....

public class XmlView extends PlainView {

    private static HashMap<Pattern, Color> patternColors;
    private static String TAG_PATTERN = "(</?[A-Za-z\\-_0-9]*)\\s?>?";
    private static String TAG_END_PATTERN = "(/>)";
    private static String TAG_ATTRIBUTE_PATTERN = "\\s(\\w*)\\=";
    private static String TAG_ATTRIBUTE_VALUE = "[a-z\\-]*\\=(\"[^\"]*\")";
    private static String TAG_COMMENT = "(<\\!--[\\w * \\S]*-->)";
    private static String TAG_CDATA = "(<\\!\\[CDATA\\[.*\\]\\]>)";

    static {
        // NOTE: the order is important!
        patternColors = new LinkedHashMap<Pattern, Color>();
        patternColors.put(Pattern.compile(TAG_PATTERN), new Color(163, 21, 21));
        patternColors.put(Pattern.compile(TAG_CDATA), Color.GRAY);
        patternColors.put(Pattern.compile(TAG_ATTRIBUTE_PATTERN), new Color(127, 0, 127));
        patternColors.put(Pattern.compile(TAG_END_PATTERN), new Color(63, 127, 127));
        patternColors.put(Pattern.compile(TAG_ATTRIBUTE_VALUE), new Color(42, 0, 255));
        patternColors.put(Pattern.compile(TAG_COMMENT), new Color(0, 128, 0));
    }

    public XmlView(Element element) {

        super(element);

        // Set tabsize to 4 (instead of the default 8)
        getDocument().putProperty(PlainDocument.tabSizeAttribute, 4);
    }

    @Override
    protected int drawUnselectedText(Graphics graphics, int x, int y, int p0,
            int p1) throws BadLocationException {

        Document doc = getDocument();
        String text = doc.getText(p0, p1 - p0);

        Segment segment = getLineBuffer();

        SortedMap<Integer, Integer> startMap = new TreeMap<Integer, Integer>();
        SortedMap<Integer, Color> colorMap = new TreeMap<Integer, Color>();

        // Match all regexes on this snippet, store positions
        for (Map.Entry<Pattern, Color> entry : patternColors.entrySet()) {

            Matcher matcher = entry.getKey().matcher(text);

            while (matcher.find()) {
                startMap.put(matcher.start(1), matcher.end());
                colorMap.put(matcher.start(1), entry.getValue());
            }
        }

        // TODO: check the map for overlapping parts

        int i = 0;

        // Colour the parts
        for (Map.Entry<Integer, Integer> entry : startMap.entrySet()) {
            int start = entry.getKey();
            int end = entry.getValue();

            if (i < start) {
                graphics.setColor(Color.black);
                doc.getText(p0 + i, start - i, segment);
                x = Utilities.drawTabbedText(segment, x, y, graphics, this, i);
            }

            graphics.setColor(colorMap.get(start));
            i = end;
            doc.getText(p0 + start, i - start, segment);
            x = Utilities.drawTabbedText(segment, x, y, graphics, this, start);
        }

        // Paint possible remaining text black
        if (i < text.length()) {
            graphics.setColor(Color.black);
            doc.getText(p0 + i, text.length() - i, segment);
            x = Utilities.drawTabbedText(segment, x, y, graphics, this, i);
        }

        return x;
    }

}

Answer 1

Your regexes for tags, comments and CDATA sections need to be split in two:

Pattern TAG_START     = Pattern.compile("</?[\\w-]+");
Pattern TAG_END       = Pattern.compile("/?>");
Pattern COMMENT_START = Pattern.compile("<!--");
Pattern COMMENT_END   = Pattern.compile("-->");
Pattern CDATA_START   = Pattern.compile("<\\[CDATA\\[");
Pattern CDATA_END     = Pattern.compile("\\]\\]>");

Whenever you get a match on one of the *_START patterns, you set a flag indicating that you're in a different mode. For example, a match on TAG_START puts you in TAG mode, meaning you're inside a tag. Each mode comes with its own set of patterns, some shared with with other modes, some mode-specific.

For example, in the default mode you look for the *_START patterns listed above, along with whatever other patterns are appropriate. In TAG mode you look for attribute/value pairs and the the TAG_END pattern, which don't make sense outside a tag. And you always look for the TAG_END pattern first , to make sure you really are still in a tag. (Or whichever *_END pattern applies to the mode you're in.)

Since modes can persist across line boundaries, this means you either have to save some state between painting one line and painting the next (complicated), or scan the whole document every time you paint a line (slow). And whichever approach you take, performance is heavily dependent on the quality of the regexes. For example, your regex:

"(<\\!--[\\w * \\S]*-->)"

...will initially consume everything from the <!-- to the end of the document, only to have to backtrack potentially a very long way. Also, if there are two or more comments, it will end up matching from the beginning of the first one to the end of the last one. For both of those reasons, I would write it like this:

"<!--[^-]*+(?>-(?!->))*+-->"

Notice the use of possessive quantifiers ( *+ ) and atomic groups ( (?>...) ). They aren't necessary from the correctness point of view, but they make the regex much more efficient, which is going to be especially important in this project.

One more thing: if you're going to use find() for this, you should also add \\G (the end-of-last-match anchor) to the beginning of every regex, like Friedl did in this regex from his book .

Answer 2

You might need to use the Pattern.MULTILINE flag?

eg

Pattern.compile(TAG_PATTERN, Pattern.MULTILINE)

Syntax highlighting in JEditorPane in java

Question

2 answers

solution1
3 ACCPTED 2010-11-11 11:55:39

solution2
0 2010-11-11 07:05:50

Syntax highlighting in JEditorPane in java

Question

2 answers

solution1 3 ACCPTED 2010-11-11 11:55:39

solution2 0 2010-11-11 07:05:50

solution1
3 ACCPTED 2010-11-11 11:55:39

solution2
0 2010-11-11 07:05:50