[英]Replace escape sequence characters in a string in Python 3.x
I have used the following code to replace the escaped characters in a string. 我已使用以下代码替换字符串中的转义字符。 I have first done splitting by
\\n
and the used re.sub()
, but still I dont know what I am missing, the code is not working according to the expectations. 我首先完成了
\\n
和使用的re.sub()
拆分,但是仍然不知道我缺少什么,代码无法按照预期工作。 I am a newbie at Python, so please don't judge if there are optimisation problems. 我是Python的新手,所以请不要判断是否存在优化问题。 Here is my code :
这是我的代码 :
#import sys
import re
String = "1\r\r\t\r\n2\r\r\n3\r\r\r\r\n\r\n\r4\n\r"
splitString = String.split('\n')
replacedStrings = []
i=0
for oneString in splitString:
#oneString = oneString.replace(r'^(.?)*(\\[^n])+(.?)*$', "")
oneString = re.sub(r'^(.?)*(\\[^n])+(.?)*$', "", oneString)
print(oneString)
replacedStrings.insert(i, oneString)
i += 1
print(replacedStrings)
My aim here is: I need the values only (without the escaped sequences) as the split strings. 我的目的是:我只需要值(无转义序列)作为分割字符串。
My approach here is: 我的方法是:
\\n
that gives me array list of separate strings. \\n
分割了字符串,这给了我单独的字符串的数组列表。 So basically, I am through with 1 and 2, but currently I am stuck at 3. Following is my Output: 基本上,我完成了1和2,但目前停留在3。以下是我的输出:
1
2
3
4
['1\r\r\t\r', '2\r\r', '3\r\r\r\r', '\r', '\r4', '\r']
You might find it easier to use re.findall
here with the simple pattern \\S+
: 您可能会发现使用带有简单模式
\\S+
re.findall
更加容易:
input = "1\r\r\t\r\n2\r\r\n3\r\r\r\r\n\r\n\r4\n\r"
output = re.findall(r'\S+', input)
print(output)
['1', '2', '3', '4']
This approach will isolate and match any islands of one or more non whitespace characters. 这种方法将隔离并匹配一个或多个非空白字符的任何岛。
Edit: 编辑:
Based on your new input data, we can try matching on the pattern [^\\r\\n\\t]+
: 根据您的新输入数据,我们可以尝试对
[^\\r\\n\\t]+
模式进行匹配:
input = "jkahdjkah \r\r\t\r\nA: B\r\r\nA : B\r\r\r\r\n\r\n\r4\n\r"
output = re.findall(r'[^\r\n\t]+', input)
print(output)
['jkahdjkah ', 'A: B', 'A : B', '4']
re.sub
isn't really the right tool for the job here. re.sub
并不是真正适合此处工作的工具。 What would be on the table is split
or re.findall
, because you want to repeatedly match/isolate a certain part of your text. 表格中的内容
re.findall
split
或re.findall
,因为您要重复匹配/隔离文本的特定部分。 re.sub
is useful for taking a string and transforming it to something else. re.sub
对于获取字符串并将其转换为其他字符串很有用。 It can be used to extract text, but does not work so well for multiple matches. 它可以用于提取文本,但在多个匹配项中效果不佳。
You were almost there, I would just use string.strip()
to replace multiple \\r
and \\n
at the start and the end of the strings 您
string.strip()
,我只需要使用string.strip()
在字符串的开头和结尾替换多个\\r
和\\n
String = "1\r\r\t\r\n2\r\r\n3\r\r\r\r\n\r\n\r4\n\r"
splitString = String.split('\n')
replacedStrings = []
i=0
for oneString in splitString:
s = oneString.strip()
if s != '':
print(s)
replacedStrings.append(s)
print(replacedStrings)
The output will look like 输出看起来像
1
2
3
4
['1', '2', '3', '4']
For "jkahdjkah \\r\\r\\t\\r\\nA: B\\r\\r\\nA : B\\r\\r\\r\\r\\n\\r\\n\\r4\\n\\r"
, the output will be ['jkahdjkah', 'A: B', 'A : B', '4']
对于
"jkahdjkah \\r\\r\\t\\r\\nA: B\\r\\r\\nA : B\\r\\r\\r\\r\\n\\r\\n\\r4\\n\\r"
,输出为['jkahdjkah', 'A: B', 'A : B', '4']
I have found one more method, this seems to work fine, it might not be as optimised as the other answers, but its just another way: 我发现了另一种方法,这种方法似乎可以很好地工作,它可能没有其他答案那样优化,但它只是另一种方法:
import re
splitString = []
String = "jhgdf\r\r\t\r\nA : B\r\r\nA : B\r\r\r\r\n\r\n\rA: B\n\r"
splitString = re.compile('[\r\t\n]+').split(String)
if "" in splitString:
splitString.remove("")
print(splitString)
I added it here, so that people going through the same trouble as me, might want to overlook this approach too. 我在这里添加了它,这样与我同样遇到麻烦的人们可能也想忽略这种方法。
Following is the Output that I have got after using the above code: 以下是使用上面的代码后得到的输出:
['jhgdf', 'A : B', 'A : B', 'A: B']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.