简体   繁体   English

在行尾拆分文本:忽略行内\\ n

[英]Split text at line end: ignoring inline \n

I have some text with START and END tags something like: 我有一些带有STARTEND标签的文本,例如:

SOURCE = '''
Text with \n \n and some more # an so ..

other text to be ignored
START
docu \n this text \n I need includive the capital start and end
but do not split \n \n only split at the actuall end of the line
END

gfsdfgadgfg \n\n\n \n
5 635634
START
similar # to the above I need \n all of this in the split line
but do not split \n \n only split at the actuall end of the line
END


more text to ignore
'''

And hope to prase it to something like 并希望将其添加到类似

parts_splitted_by_actual_end_of_line = {
'Part1_lines' : 
['START',
'docu \n this text \n I need includive the capital start and end',
'but do not split \n \n only split at the actuall end of the line',
'END'],

'Part1_lines' : 
['START',
'similar # to the above I need \n all of this in the split line',
'but do not split \n \n only split at the actuall end of the line',
'END'],
}

I can find the START and END tags with string find and extract the text between. 我可以使用字符串查找找到STARTEND标签,并提取其间的文本。

But I'm completely stuck to split the lines keeping the \\n within the line ? 但是我完全被束缚在将\\n保持在行内的情况下?

Any suggestion would be really appreciated. 任何建议将不胜感激。

You want to use a raw string. 您要使用原始字符串。 Add ar prefix before your string literal like this: 在字符串文字之前添加ar前缀,如下所示:

SOURCE = r'''Insert text here\n'''

This will do the escaping of your newline character for you. 这将为您完成换行符的转义。

To unescape it later afterwards (probably after your split), take the string and decode it like this: 要在以后(可能是在拆分之后)取消转义,请采用以下字符串并对其进行解码:

string = string.decode('string_escape')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM