如何从网站网址获取html元素名称

Question

I want to get HTML element names and attribute names (I don't want to use documet.getElementsByTag("*") or document.select("*") ) without hard-coding. 我想获得HTML元素名称和属性名称（我不想使用documet.getElementsByTag("*")或document.select("*") ）而不需要硬编码。

Is there any chance to get HTML element names dynamically by using Apache Tika and, if possible, please provide me any sample example? 有没有机会使用Apache Tika动态获取HTML元素名称，如果可能的话，请提供给我任何示例示例？

    Document doc=Jsoup.connect("http://seenyc.co/").get();
            Elements elements=doc.getAllElements();
            for(Element ele:elements){


                String  s=ele.tagName();
                Attributes n=ele.attributes();
                System.out.println(s);
                System.out.println(n);
}

Answer 1

   HashSet<String> allTags=new HashSet<String>();
   Document doc=Jsoup.connect("http://seenyc.co/").get();
            Elements elements=doc.getAllElements();
            for(Element ele:elements){
                String  s=ele.tagName();
                Attributes n=ele.attributes();
                allTags.add(s);
}

// here your hashset will have all distinct tag names from website

Is this what you wanted? 这是你想要的吗？

如何从网站网址获取html元素名称

问题描述

1 个解决方案

解决方案1
2 2014-03-07 12:22:27

如何从网站网址获取html元素名称

问题描述

1 个解决方案

解决方案1 2 2014-03-07 12:22:27

解决方案1
2 2014-03-07 12:22:27