简体   繁体   English

NSXMLParser:如果有特殊字符或斜体,则不会检索正确的数据

[英]NSXMLParser : not retrieving correct data if there are special characters or italics

I am using NSXMLParser to parse an xml for url. 我正在使用NSXMLParser解析URL的xml。 Some of the elements contains special characters in text and also italics. 一些元素在文本中包含特殊字符,也包含斜体。

  • Please find the below xml element with italic tags in text: 请在文本中找到以下带有斜体标签的xml元素:
 <name>Verify Settings<i>i</i>patch level</name> 

NSXMLParser breaks the text and gives Output: Verify Settings NSXMLParser中断文本并提供输出:验证设置

Is there any way to parse italics text in between elements? 有什么方法可以解析元素之间的斜体文本?

  • Please find the xml with special characters below : 请在下面找到带有特殊字符的xml:
 <impact> In 2003, the ¿shared APPL_TOP¿ architecture was introduced, which allowed the sharing of a single APPL_TOP, however the tech stack · Reduced disk space requirements · Reduced maintenance · Reduced administrative costs · Reduced patching down time · Less complex to add additional nodes, making scalability easier · Complexity of instance reduced · Easier backups · Easier cloning</impact> 

It breaks the text and gives Output: e costs ·Reduced patching down time ·Less complex to add additional nodes, making scalability easier ·Complexity of instance reduced ·Easier backups ·Easier cloning 它打破了文本并给出了输出: e成本·减少了修补时间·减少了添加额外节点的复杂性,从而使可伸缩性更加容易·降低了实例复杂度·简化了备份·简化了克隆

Any suggestions on how to parse italic tags in the text and special characters using NSXMLParser ? 关于如何使用NSXMLParser解析文本和特殊字符中的斜体标签的任何建议?


Here is my foundCharacters code: 这是我的foundCharacters代码:

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
    if (!self.currentStringValue) {
    // currentStringValue is an NSMutableString instance variable
    self.currentStringValue = [[NSMutableString alloc] init];
}
[self.currentStringValue appendString:string];
} 

Both of these look less like XML parsing problems than XML generation problems. 与XML生成问题相比,这两者看起来都不太像XML解析问题。 How are you generating this XML? 您如何生成此XML? It feels like a manually generated XML, as opposed to something generated by a proper XML library. 感觉就像是手动生成的XML,而不是由适当的XML库生成的东西。

Look at your XML from the parser's perspective: How is NSXMLParser supposed to know that the <i> is HTML in the <name> element, and not a new XML tag itself?!? 从解析器的角度来看您的XML: NSXMLParser应该如何知道<i><name>元素中的HTML,而不是新的XML标记本身?! If this is indeed what the XML looks like, you really should just fix your web service. 如果确实是XML的样子,那么您实际上应该只修复Web服务。

For example, looking at your problem with the italics the problem is that the <i> looks like a new element name. 例如,查看您的斜体字问题是<i>看起来像一个新的元素名称。 Generally that should be represented either as: 通常,应将其表示为:

<name>Verify Settings&lt;i&gt;i&lt;/i&gt;patch level</name>

Or as 或作为

<name><![CDATA[Verify Settings<i>i</i>patch level]]></name>

This encoding of the name property is generally done by the API that does the XML encoding in the web service. name属性的这种编码通常由在Web服务中执行XML编码的API完成。 Generally you don't need to do anything to get this behavior. 通常,您无需执行任何操作即可获得此行为。 But if your web service is manually creating its own XML, that could give you the sort of output that you describe in your original question. 但是,如果您的Web服务是手动创建自己的XML,则可能会为您提供您在原始问题中描述的那种输出。

On the second example, I would have thought that the characters in the XML must conform to the character set outlined in the <?xml ...> tag, eg,: 在第二个示例中,我以为XML中的字符必须符合<?xml ...>标记中概述的字符集,例如:

<?xml version="1.0" encoding="ISO-8859-1"?>

What does your <?xml ...> tag say? 您的<?xml ...>标签怎么说? Are the characters listed falling within the encoding listed there? 列出的字符是否在此处列出的编码范围内?


Looking at your revised foundCharacters , the new rendition is much better. 查看您修改后的foundCharacters ,新的foundCharacters效果要好得多。 The previous rendition suffered from a problem, insofar as it assumed that foundCharacters would be called only once for any given pair of <name> and </name> tags. 先前的呈现存在一个问题,因为它假定对于任何给定的<name></name>标记对,只会对一次foundCharacters调用一次。 That is not necessarily the case. 不一定是这样。 Your latest rendition correctly creates currentStringValue if it needs to, and then appends to it. 您最新的演示currentStringValue在需要时正确创建currentStringValue ,然后将其追加。 That is the correct approach, consistent with the examples in the Apple documentation . 这是正确的方法,与Apple文档中的示例一致。 You might only want to do that if you're parsing one of the elementName types that you care about (eg <name> ), but with that minor caveat, this new rendition looks much better. 如果要解析您关心的elementName类型之一(例如<name> ),则可能只想这样做,但是有了一点小小的警告,这个新的外观看起来要好得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM