简体   繁体   English

为什么我的 Python 正则表达式与 Windows 中的换行符 \\r\\n 不匹配?

[英]Why does my Python regular expression not match \r\n for newline in Windows?

I'm still fairly new to Python, and am having trouble with one of my regular expressions.我对 Python 还是很陌生,并且在使用我的正则表达式之一时遇到了问题。 I've researched this online and tried lots of things in Python, but am stuck.我在网上对此进行了研究,并在 Python 中尝试了很多东西,但被卡住了。 Since I'm using Windows, I'm expecting \\r\\n to match a new line break in a text file because that's how lines are terminated in Windows.由于我使用的是 Windows,我希望 \\r\\n 匹配文本文件中的换行符,因为这就是行在 Windows 中的终止方式。 But what I'm finding is that only \\n matches.但我发现只有 \\n 匹配。 Why is that?这是为什么?

Here's my code (using \\r\\n, which doesn't match)这是我的代码(使用 \\r\\n,不匹配)

filename = 'C:\\Users\\jason\\OneDrive\\Documents\\LTspice_my_work\\example_ac_analysis_2.raw'
with open (filename, 'r' ) as f:
    content = f.read()
    print(content)
    pattern3 = r'Variables:\r\n(.*)Values:' 
    print("Here's what matches:")
    text = re.search( pattern3,content,re.DOTALL).group(1)
    print(text)

which returns:返回:

Command: Linear Technology Corporation LTspice XVII
Variables:
        0       frequency       frequency
        1       V(v1)   voltage
        2       V(vout) voltage
        3       I(C1)   device_current
        4       I(R1)   device_current
        5       I(V1)   device_current
Values:
0               1.000000000000000e+000,0.000000000000000e+000
        2.000000000000000e+000,0.000000000000000e+000
        1.998028025380720e+000,-6.276990166202591e-002
        3.943949238559487e-007,1.255398033240518e-005
        -3.943949238559341e-007,-1.255398033240518e-005
        -3.943949238559568e-007,-1.255398033240518e-005
1               3.162277660168380e+000,0.000000000000000e+000
        2.000000000000000e+000,0.000000000000000e+000
        1.980453705393099e+000,-1.967499214255068e-001
        3.909258921380289e-006,3.934998428510137e-005
        -3.909258921380277e-006,-3.934998428510137e-005
        -3.909258921380287e-006,-3.934998428510137e-005


Here's what matches:
Traceback (most recent call last):

  File "C:\Users\jason\OneDrive\Documents\Python\Python_scripts\example_ltspice_pytool.py", line 176, in <module>
    text = re.search( pattern3,content,re.DOTALL).group(1)

AttributeError: 'NoneType' object has no attribute 'group'

But when I use only \\n I get the match I'm looking for with this code但是当我只使用 \\n 时,我得到了我正在寻找的匹配代码

filename = 'C:\\Users\\jason\\OneDrive\\Documents\\LTspice_my_work\\example_ac_analysis_2.raw'
with open (filename, 'r' ) as f:
    content = f.read()
    print(content)
    pattern3 = r'Variables:\n(.*)Values:' 
    print("Here's what matches:")
    text = re.search( pattern3,content,re.DOTALL).group(1)
    print(text)

which returns返回


Command: Linear Technology Corporation LTspice XVII
Variables:
        0       frequency       frequency
        1       V(v1)   voltage
        2       V(vout) voltage
        3       I(C1)   device_current
        4       I(R1)   device_current
        5       I(V1)   device_current
Values:
0               1.000000000000000e+000,0.000000000000000e+000
        2.000000000000000e+000,0.000000000000000e+000
        1.998028025380720e+000,-6.276990166202591e-002
        3.943949238559487e-007,1.255398033240518e-005
        -3.943949238559341e-007,-1.255398033240518e-005
        -3.943949238559568e-007,-1.255398033240518e-005
1               3.162277660168380e+000,0.000000000000000e+000
        2.000000000000000e+000,0.000000000000000e+000
        1.980453705393099e+000,-1.967499214255068e-001
        3.909258921380289e-006,3.934998428510137e-005
        -3.909258921380277e-006,-3.934998428510137e-005
        -3.909258921380287e-006,-3.934998428510137e-005


Here's what matches:
        0       frequency       frequency
        1       V(v1)   voltage
        2       V(vout) voltage
        3       I(C1)   device_current
        4       I(R1)   device_current
        5       I(V1)   device_current

Thanks for your help in advance!提前感谢您的帮助!

当您以文本模式(默认)打开文件时, \\r\\n会在您读取文件时自动转换为\\n ,因此您不必担心您使用的是什么操作系统。

Python, by default, processes text files in universal newline mode.默认情况下,Python 以通用换行模式处理文本文件。 Quoting from the docs :引用文档

newline controls how line endings are handled. newline控制如何处理行尾。 It can be None , '' , '\\n' , '\\r' , and '\\r\\n' .它可以是None'''\\n''\\r''\\r\\n' It works as follows:它的工作原理如下:

  • When reading input from the stream, if newline is None , universal newlines mode is enabled.从流中读取输入时,如果newlineNone ,则启用通用换行符模式。 Lines in the input can end in '\\n' , '\\r' , or '\\r\\n' , and these are translated into '\\n' before being returned to the caller.输入中的行可以以'\\n''\\r''\\r\\n'结尾,这些在返回给调用者之前会被转换为'\\n' If it is '' , universal newlines mode is enabled, but line endings are returned to the caller untranslated.如果是'' ,则启用通用换行符模式,但行尾将返回给调用者未翻译。 If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.如果它具有任何其他合法值,则输入行仅由给定的字符串终止,并且行尾未翻译地返回给调用者。

So in short, your strings don't have \\r in them by the time you receive them.因此,在短期,你的字符串没有\\r通过您收到他们的时间在其中。 If you want them to keep the \\r , change your open call to add newline='' (the csv module requires this, because line-endings are part of the CSV dialect, and it needs the original, untranslated endings to process the input correctly).如果您希望他们保留\\r ,请更改您的open调用以添加newline=''csv模块需csv ,因为行尾是 CSV 方言的一部分,它需要原始的未翻译的结尾来处理输入正确)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM