简体   繁体   English

如何以独立于平台的方式为换行符派生一个字符串并在正则表达式模式中使用它?

[英]How to derive a string for the newline characters in a platform-independent way and use it in a regular expression pattern?

I have a question about how to represent the newline characters as a string in Python.我有一个关于如何在 Python 中将换行符表示为字符串的问题。 I thought I could use the built-in function repr to achieve this.我想我可以使用内置的 function repr来实现这一点。 So I try to verify the feasibility of this method by running the following code:所以我尝试通过运行以下代码来验证这种方法的可行性:

import os

lineBreakAsStr = repr(os.linesep)
print(f'lineBreakAsStr = {lineBreakAsStr}') # line 4
print(lineBreakAsStr == '\\r\\n')           # line 5

I expect the result of line 5 should be ' True ' if the function repr can convert the value of os.linesep to a string successfully.如果 function repr可以成功地将os.linesep的值转换为字符串,我希望第 5 行的结果应该是“ True ”。 But in my Windows 7 PC, the output of line 4 is ' lineBreakAsStr = '\r\n' ' and the output of line 5 is ' False '.但是在我的 Windows 7 PC 中,第 4 行的 output 是 'lineBreakAsStr = '\r\n' ',而 output 是 False '6DZ。

Can anyone explain to me why?谁能向我解释为什么? And how should I get the string which stands for newline characters from the value of os.linesep and put it in a regular expression pattern instead of using a fixed string like ' \\r\\n '?我应该如何从os.linesep的值中获取代表换行符的字符串并将其放入正则表达式模式中,而不是使用像 '\\r\\n' 这样的固定字符串?

Below is a code snippet to demonstrate what I want to do.下面是一个代码片段来演示我想要做什么。 ( I prefer to use the code in line 13 to the code in line 14. But the code in 13 does not work. It has to be modified in some way to find the substring I want to find. ): (我更喜欢使用第13行的代码而不是第14行的代码。但是13中的代码不起作用。必须以某种方式对其进行修改才能找到我要查找的substring。):

import os, re

def f(pattern, data):
  p =  re.compile(pattern)
  m = p.search(data)
  if m is not None:
    print(m.group())
  else:
    print('Not match.')

dataSniffedInConsole = ('procd: - init -\\\\r\\\\nPlease press Enter '
                        'to activate this console.\\\\r\\\\n')
lineBreakAsStr = repr(os.linesep)   # line 13
# lineBreakAsStr = '\\\\\\\\r\\\\\\\\n' # line 14

pattern = rf'Please press Enter to activate this console.{lineBreakAsStr}'

f(pattern, dataSniffedInConsole)

Using repr will put quotes around the string.使用repr将在字符串周围加上引号。 The quotes are probably causing your issue.引号可能导致您的问题。

>>> newline = repr(os.linesep)
>>> print(newline)
'\\r\\n'
>>> newline == "'\\r\\n'"
True

A quick fix to your problem is to remove the quotes:快速解决您的问题是删除引号:

>>> newline = repr(os.linesep).strip("'")
>>> print(newline)
\\r\\n
>>> newline == "'\\r\\n'"
False
>>> newline == "\\r\\n"
True

I recommend you find a way to read the raw data from the console rather than a representation of it.我建议您找到一种从控制台读取原始数据而不是表示的方法。 Using the raw data will be much easier to process.使用原始数据将更容易处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM