简体   繁体   中英

How to “normalize” an URL replacing any special characters with new ones

In any URL, you can have special characters like * ? & ~ : / *

and soon if not already, accentuated characters

What I'd like is to convert ANY url into it's nearest equivalent in pure ASCII character
THEN replacing any remaining spécial charaters by a _

I've tried this looking and inspiring myslef with many examples over the net, but it do not work (for example, using this code, the character "é" is not converted to "e" in @"http://www.mélange.fr/~fermer.php?aa=10&ee=13" )

NSMutableCharacterSet *charactersToKeep = [NSMutableCharacterSet alphanumericCharacterSet];
[charactersToKeep addCharactersInString:@"://&=~?"];
NSCharacterSet* charactersToRemove = [charactersToKeep invertedSet];
myNSString = [[[myNSString decomposedStringWithCanonicalMapping] componentsSeparatedByCharactersInSet:charactersToRemove] componentsJoinedByString:@""];

to start, after I will have to convert remaining special characters with _

How may I achieve this ?

As an example (and only for example), I'd like to convert :

http://www.mélange.fr/~fermer.php?aa=10&ee=13

to

http___www.melange.fr__fermer_php_aa_10_ee_13

of course without having to check one by one each possible special or accentued character .

Two thoughts:

  1. To replace accented characters with unaccented ones, there are a couple of candidates:

    • You can use CFStringTransform :

       NSMutableString *mutableString = [string mutableCopy]; CFStringTransform((__bridge CFMutableStringRef)mutableString, NULL, kCFStringTransformStripCombiningMarks, NO); 
    • You could use dataUsingEncoding:allowLossyConversion:

       NSData *data = [string dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES]; NSString *result = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding]; 

      Characters it doesn't know what to do with become ? and but this sometimes replaces one character with multiple characters (eg © with (C) ), which you may or may not want.

  2. Once you do this international character conversion, it looks like you want replace any non-alphanumeric character (or period) with an underscore, which you could do with a stringByReplacingOccurrencesOfString with a regular expression:

      NSString *result = [string stringByReplacingOccurrencesOfString:@"[^a-z0-9\\\\.]" withString:@"_" options:NSRegularExpressionSearch | NSCaseInsensitiveSearch range:NSMakeRange(0, [string length])]; 

    There are lots of permutations of this regular expression that will accomplish the same thing, but hopefully you get the idea.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM