简体   繁体   中英

Double quote string manipulation

I have some input data from ASCII files which uses double quote to encapsulate string as well as still use double quote inside those strings, for example:

"Reliable" "Africa" 567.87 "Bob" "" "" "" "S 05`56'21.844"" "No Shift"

Notice the double quote used in the coordinate.

So I have been using:

valList = shlex.split(line)

But shlex get's confused with the double quote used as the second in the coordinate.

I've been doing a find and replace on '\\"\\"' to '\\\\\\"\\"' . This of course turns an empty strings to \\"" as well so I do a find and replace on (this time with spaces) ' \\\\\\"\\" ' to ' \\"\\"" ' . Not exactly the most efficient way of doing it!

Any suggestions on handling this double quote in the coordinate?

I would do it this way:

I would treat this line of text as a csv file. Then according to RFC 4180 :

  1. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

    "aaa","b""bb","ccc"

Then all you would need to do is to add another " to your coordinates. So it would look like this "S 05 56'21.844""" (NOTE extra quote) Then you can use a standart csv` module to break it apart and extract necessary information.

    >>> from StringIO import StringIO
    >>> import csv
    >>>
    >>> test = '''"Reliable" "Africa" 567.87 "Bob" "" "" "" "S 05`56'21.844""" "No Shift"'''
    >>> test_obj = StringIO(test)
    >>> reader = csv.reader(test_obj, delimiter=' ', quotechar='"', quoting=csv.QUOTE_ALL)
    >>> for i in reader:
    ...   print i
    ... 

The output would be :

['Reliable', 'Africa', '567.87', 'Bob', '', '', '', 'S 05`56\'21.844"', 'No Shift']

I'm not good with regexes, but this non-regex suggestion might help ...

INPUT = ('"Reliable" "Africa" 567.87 "Bob" "" "" "" "S 05`56'
         "'" 
         '21.844"" "No Shift"')


def main(input):
    output = input

    surrounding_quote_symbol = '<!>'

    if input.startswith('"'):        
        output = '%s%s' % (surrounding_quote_symbol, output[1:])
    if input.endswith('"'):
        output = '%s%s' % (output[:-1], surrounding_quote_symbol)

    output = output.replace('" ', '%s ' % surrounding_quote_symbol)
    output = output.replace(' "', ' %s' % surrounding_quote_symbol)
    print "Stage 1:", output

    output = output.replace('"', '\"')
    output = output.replace(surrounding_quote_symbol, '"')

    return output

if __name__ == "__main__":
    output = main(INPUT)
    print "End results:", output

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM