简体   繁体   English

如何在Java中使用Jericho查找自定义开始标记?

[英]How do I look for a custom start tag using Jericho in Java?

As the title says, I'm trying to match a non-standard StartTagType in the form of <foo:bar ...> 正如标题所说,我试图以<foo:bar ...>的形式匹配非标准的StartTagType

How would I do this with Jericho? 我如何与杰里科一起做这件事?

Edit : 编辑

I have created the follow custom StartTagType: 我创建了以下自定义StartTagType:

PrimoResultStartTagType primoSTT = new PrimoResultStartTagType("search", "<sear:DOC", ">", EndTagType.NORMAL, false, true, true);

...and: ...和:

class PrimoResultStartTagType extends StartTagType {

    protected PrimoResultStartTagType(String arg0, String arg1, String arg2, EndTagType arg3, boolean arg4, boolean arg5, boolean arg6) {
        super(arg0, arg1, arg2, arg3, arg4, arg5, arg6);
    }

    @Override
    protected Tag constructTagAt(Source arg0, int arg1) {
        return null;
    }

}

However, when i do a source.getAllElements(...) , I get no matches. 但是,当我执行source.getAllElements(...) ,我没有匹配。

Maybe it will help: 也许它会有所帮助:

Example html: 示例html:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
    <title>StartTagType (Jericho HTML Parser 3.1)</title>
</head>

<body>

<span>simple tag</span>

<test:name>custom tag</test:name>

</body>

</html>

And sample code: 示例代码:

public class Main {

public static void main(String[] args)
        throws IOException {

    URL url = Main.class.getClassLoader().getResource("test.html");
    Source source = new Source(url);
    List<Element> elementList = source.getAllElements("test:name");
    for (Element element : elementList) {
        System.out.println("Custom tag content: " + element.getContent().toString());
    }
}

} }

Output: 输出:

Custom tag content: custom tag

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM