简体   繁体   中英

Can't delete "\r\n" from a string

I have a string like this:

la lala 135 1039 921\r\n

And I can't remove the \r\n .

Initially this string was a bytes object but then I converted it to string

I tried with .strip("\r\n") and with .replace("\r\n", "") but nothing...

>>> my_string = "la lala 135 1039 921\r\n"
>>> my_string.rstrip()
'la lala 135 1039 921'

Alternate solution with just slicing off the end, which works better with the bytes->string situation:

>>> my_string = b"la lala 135 1039 921\r\n"
>>> my_string = my_string.decode("utf-8")
>>> my_string = my_string[0:-2]
>>> my_string
'la lala 135 1039 921'

Or hell, even a regex solution, which works better:

re.sub(r'\r\n', '', my_string)

The issue is that the string contains a literal backslash followed by a character. Normally, when written into a string such as .strip("\r\n") these are interpreted as escape sequences, with "\r" representing a carriage return (0x0D in the ASCII table) and "\n" representing a line feed (0x0A).

Because Python interprets a backslash as the beginning of an escape sequence, you need to follow it by another backslash to signify that you mean a literal backslash. Therefore, the calls need to be .strip("\\r\\n") and .replace("\\r\\n", "") .

Note: you really don't want to use .strip() here as it affects a lot more than just the end of the string as it will remove backslashes and the letters "r" and "n" from the string. .replace() is a little better here in that it will match the whole string and replace it, but it will match \r\n in the middle of the string too, not just the end. The most straight-forward way to remove the sequence is the conditional given below.

You can see the list of escape sequences Python supports in the String and Byte Literals subsection of the Lexical Analysis section in the Python Language Reference.

For what it's worth, I would not use .strip() to remove the sequence. .strip() removes all characters in the string (it treats the string as a set, rather than a pattern match). .replace() would be a better choice, or simply using slice notation to remove the trailing "\\r\\n" off the string when you detect it's present:

if s.endswith("\\r\\n"):
    s = s[:-4]

'\r\n' is also a standard line delimiter for.splitlines(), so this can also work.

>>> s = "la lala 135 1039 921\r\n"
>>> type(s)
<class 'str'>
>>> t = ''.join(s.splitlines())
>>> t
'la lala 135 1039 921'
>>> type(t)
<class 'str'>

You could also determine the length of the string say 20 characters then truncate it to 18 regardless of the last two characters or verify they are the characters before you do that. Sometimes it helps to compare the ascii value first pseudo logic:

if last character in string is tab, cr, lf or? then shorten the string by one. Repeat till you no longer find ending characters matching tab, cr, lef, etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM