简体   繁体   中英

How to extract textual content from an SGML DTD using Perl?

I'm looking into extracting all the content from a DTD using Perl, but I'm not sure which is the best way to go about it. I know there are modules for working with XML, but I'm not sure if there are any for this type of work with SGML or if I should try to create a regular expression for this work?

I'm new to SGML and Perl along with not having much experience with regex, except for very simple pattern matching.

You have 2 options here:

  • use the old perlSGML distribution which I have used in the (remote!) past. This being perl it should still run on modern perl,

  • convert your SGML to XML using osx , which is part of openSP, available for at least Debian/Ubuntu (the package is called opensp )and most likely other platforms, then use XML tools like XML::LibXML, or XML::Twig

There are a lot more XML tools than SGML tools these days, but of course you may loose some information since DTDs are slightly simpler in XML than in SGML

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM