简体   繁体   中英

Android parsing HTML entities using DOM parser for RSS feed

I am using the google books api for an Android app that I am building. This is a sample of the XML file

<dc:description>This trilogy includes &amp;quot; The Hitchhiker&amp;#39;s Guide to the Galaxy&amp;quot; , &amp;quot; TheRestaurant at the End of the Universe&amp;quot; , &amp;quot; Life, Universe and Everything&amp;quot; and &amp;quot; So Long ...</dc:description>
<dc:format>590 pages</dc:format>
<dc:format>book</dc:format>

And this is a fraction of the code I'm using to extract the description

if ( entry.getElementsByTagName( "dc:description" ).item( 0 ) != null ) {
  Element d = ( Element ) entry.getElementsByTagName( "dc:description" )
      .item( 0 );
  b.setDescription( d.getFirstChild( ).getNodeValue( ) );

}

The problem is when using the HTML.fromHtml(Str) function it cuts off the text at the first HTML entity (so in this example it says simply

This trilogy includes

When I run the same code outside of Android it works ok and at least shows the string with the escape characters, ie

This trilogy includes &quot; The Hitchhiker&#39;s Guide to the Galaxy&quot; , &quot; TheRestaurant at the End of the Universe&quot; , &quot; Life, Universe and Everything&quot; and &quot; So Long ...

If I then manually add this to the app the HTML.fromHtml() works fine so I guess the problem is Android's implementation of the parser.

A similar problem is Android decoding html in xml file . I have tried setting the validation of the factory to false, and as it is an RSS feed I cannot declare an HTML root element (as suggested in this post).

我最终没有从Google获得描述数据,但是我认为可以通过在document元素上运行normalise()来解决该问题-我在另一个API上也遇到了类似的问题,并对其进行了修复。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM