简体   繁体   中英

Prevent adding first line when using htmlParse() from 'XML' package

I have a problem while doing a htmlParse() on a XHTML document.

When it loads into R as an 'externalptr', I can see that one line is added, at the top of the file:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">

I don't want to make this line appear because it breaks my application. I would like to delete it within the htmlParse() function, and not having to delete this line manually for each XHTML I have.

Any suggestions? I've tried changing some parameters passed to the function htmlParse() but at this time, after trying with it, I have not found it.

If it helps, here are the first lines of the XHTML I parse:

<?xml version="1.0" encoding="utf-8" ?>
<html dir="ltr" xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xml:lang="es">
<head>
<meta charset="utf-8" />

I tried with xmlRoot() and then saved with saveXML() , including as parameters the prefix <?xml version="1.0" encoding="utf-8" ?>

There was also an encoding problem but that's another story. In Windows didn't work, in Ubuntu finally worked.

Thank you all.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM