简体   繁体   中英

Trying to decode HTML characters in objective C is taking forever and 100% cpu usage

I am using the piece of code to decode few html special characters in a XML data

    +(NSString *)getNSStringFormHTMLString:(NSString *)html {
        if(SYSTEM_VERSION_GREATER_THAN_OR_EQUAL_TO(@"7.0"))
        {
            NSMutableAttributedString* attrDisplayString = [[NSMutableAttributedString alloc] initWithData:[html dataUsingEncoding:NSUnicodeStringEncoding]options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType} documentAttributes:nil error:nil];
            return [attrDisplayString string];
        }
        return html; 
}

This code works in handy if I am using it few number of times.

But,

My use case is, I need to parse a junk XML which has lot of encoded characters.

like this

  <filter name="Added_Time">
       <displayname><![CDATA[Added&#x20;Time]]></displayname>
       <value display="Dec - 2013"><![CDATA[Added_Time&#x3a;Dec&#x20;-&#x20;2013]]></value>
       <value display="Feb - 2014"><![CDATA[Added_Time&#x3a;Feb&#x20;-&#x20;2014]]></value>
       <value display="Mar - 2014"><![CDATA[Added_Time&#x3a;Mar&#x20;-&#x20;2014]]></value>
       <value display="Apr - 2014"><![CDATA[Added_Time&#x3a;Apr&#x20;-&#x20;2014]]></value>
       <value display="Sep - 2014"><![CDATA[Added_Time&#x3a;Sep&#x20;-&#x20;2014]]></value>
       <value display="Nov - 2014"><![CDATA[Added_Time&#x3a;Nov&#x20;-&#x20;2014]]></value>
    </filter>

Since the above piece of code is called repeatedly to decode every key and values, I get 100% cpu usage and parsing is going on forever. (anyhow it gets completed in 2 or 3 minutes of time)

see this

看图片

apparently,

getNSStringFormHTMLString: is proving to be a costly operation.

Help me out!!

I need a solution to do similar task which doest consume too much of time.

Best thing I can come up with is to try out an alternative library.

This Answer has a custom solution made up of two file and is written in C so it should be fairly quick. I'd try running through the decode step and then try your getNSStringFromHTMLString: call.

+(NSString *)getNSStringFormHTMLString:(NSString *)html {
    if(SYSTEM_VERSION_GREATER_THAN_OR_EQUAL_TO(@"7.0"))
    {
        NSMutableData* data = [[html dataUsingEncoding:NSUTF8StringEncoding] mutableCopy];
        char* output = malloc(data.length);
        char* bytes = data.mutableBytes;
        decode_html_entities_utf8(bytes, bytes);
        NSString* converted = [NSString stringWithUTF8String:output];

        NSMutableAttributedString* attrDisplayString = [[NSMutableAttributedString alloc] initWithData:[converted dataUsingEncoding:NSUnicodeStringEncoding]options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType} documentAttributes:nil error:nil];
        return [attrDisplayString string];
    }
    return html; 
}

No idea if this will be quicker or not as it does have a copy when converting to a mutable NSData array.

If you find it's still slow, you can always try to replace NSData code by directly converting the data in the html string using - (BOOL)getBytes:(void *)buffer maxLength:(NSUInteger)maxBufferCount usedLength:(NSUInteger *)usedBufferCount encoding:(NSStringEncoding)encoding options:(NSStringEncodingConversionOptions)options range:(NSRange)range remainingRange:(NSRangePointer)leftover;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM