简体   繁体   中英

Why can't I replace target string with regular expression in python?

I want to replace the \\n\\t+ in the string "2005-03-08\\n\\t\\t\\t\\t\\t10派3元(含税)\\n\\t\\t\\t\\t\\t" , why can't I get it to work?

str1="2005-03-08\n\t\t\t\t\t10派3元(含税)\n\t\t\t\t\t"
str2=str1.replace("\n\t+","")
str2
'2005-03-08\n\t\t\t\t\t10派3元(含税)\n\t\t\t\t\t'

Why I can't get 2005-03-0810派3元(含税) as the result?

Your code isn't doing a regular expression replacement, but rather it tries (unsuccessfully) to use the built in str.replace method. That doesn't work, as the semantics are not right.

There are two reasonable fixes:

  1. You can stick with string replacement, and simply use the right syntax (but note that this replaces all tabs, not only ones that follow newlines):

     str2 = str1.replace("\\n", "").replace("\\t", "") 
  2. You can import the re module and do your intended replacement:

     import re str2 = re.sub(r"\\n\\t+", "", str1) 

Well your main reason is because str2 is looking for '\\n\\t+' , which is not found in the statement. And also, your ideal output won't be like that because it is looking for removing all of the \\n\\t s, but your replace() only looks for the ones that come directly after a \\n . Try this code:

>>> str1="2005-03-08\n\t\t\t\t\t10派3元(含税)\n\t\t\t\t\t"
>>> ideal = "2005-03-0810派3元(含税)" #Just to check if they are the same
>>> str2 = str1.replace('\n', '').replace('\t', '')
>>> str2
'2005-03-0810\xe6\xb4\xbe3\xe5\x85\x83(\xe5\x90\xab\xe7\xa8\x8e)' #The encoded statement
>>> print str2
2005-03-0810派3元(含税)
>>> str2==ideal
True
>>> 

You could do

str2 = ''.join(s.strip() for s in str1.splitlines())

(although this will also remove leading and trailing spaces).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM