简体   繁体   English

Python多行正则表达式可在Shell中工作,但不能在程序中工作

[英]Python multiline regex works in shell but not in program

I'm trying to find and replace a multiline pattern in a JSON feed. 我正在尝试查找和替换JSON feed中的多行模式。 Basically, I'm looking for a line ending "}," followed by a line with just "}". 基本上,我正在寻找以“}”结尾的行,然后是仅以“}”结尾的行。

Example input would be: 输入示例为:

s = """
              "essSurfaceFreezePoint":    "1001",
              "essSurfaceBlackIceSignal": "4"
              },
            }
          }
"""

and I want to find: 我想找到:

"""
              },
            }
"""

and replace it with: 并替换为:

"""
              }
            }
"""

I've tried the following: 我尝试了以下方法:

pattern = re.compile(r'^ *},\n^ *}$',flags=re.MULTILINE)
pattern.findall(feedStr)

This works in the python shell. 这在python shell中有效。 However, when I do the same search in my python program, it finds nothing. 但是,当我在python程序中进行相同的搜索时,它什么也找不到。 I'm using the full JSON feed in the program. 我在程序中使用完整的JSON提要。 Perhaps it's getting a different line termination when reading the feed. 读取提要时,可能会得到不同的线路终端。

The feed is at: 提要位于:

http://hardhat.ahmct.ucdavis.edu/tmp/test.json http://hardhat.ahmct.ucdavis.edu/tmp/test.json

If anyone can point out why this is working in the shell, but not in the program, I'd greatly appreciate it. 如果有人可以指出为什么它在Shell中有效,但在程序中无效,那么我将不胜感激。 Is there a better way to formulate the regular expression, so it would work in both? 有没有更好的方式来表达正则表达式,因此它在两种方式中都可以工作?

Thanks for any advice. 感谢您的任何建议。

===================================================================================== ================================================= ==================================

To make this clearer, I'm adding my test code here. 为了更清楚一点,我在这里添加测试代码。 Note that I'm now including the regular expression provided by Ahosan Karim Asik. 请注意,我现在包括Ahosan Karim Asik提供的正则表达式。 This regex works in the live demo link below, but doesn't quite work for me in a python shell. 此正则表达式可在下面的实时演示链接中使用,但在python shell中对我而言却不太有效。 It also doesn't work against the real feed. 它也与实际的提要不兼容。

Thanks again for any assistance. 再次感谢您的协助。

import urllib2
import json
import re

if __name__ == "__main__":
    # wget version of real feed:
    # url = "http://hardhat.ahmct.ucdavis.edu/tmp/test.json"
    # Short text, for milepost and brace substitution test:
    url = "http://hardhat.ahmct.ucdavis.edu/tmp/test.txt"
    request = urllib2.urlopen(url)
    rawResponse = request.read()
    # print("Raw response:")
    # print(rawResponse)

    # Find extra comma after end of records:
    p1 = re.compile('(}),(\r?\n *})')
    l1 = p1.findall(rawResponse)
    print("Brace matches found:")
    print(l1)

    # Check milepost:
    #p2 = re.compile('( *\"milepost\": *\")')
    p2 = re.compile('( *\"milepost\": *\")([0-9]*\.?[0-9]*)\r?\n')
    l2 = p2.findall(rawResponse)
    print("Milepost matches found:")
    print(l2)

    # Do brace substitutions:
    subst = "\1\2"
    response = re.sub(p1, subst, rawResponse)

    # Do milepost substitutions:
    subst = "\1\2\""
    response = re.sub(p2, subst, response)
    print(response)

try this: 尝试这个:

 import re
    p = re.compile(ur'(^ *}),(\n^ *})$', re.MULTILINE)
    test_str = u" \"essSurfaceFreezePoint\": \"1001\",\n \"essSurfaceBlackIceSignal\": \"4\"\n },\n }\n }"
    subst = u"$1$2"

    result = re.sub(p, subst, test_str)

live demo 现场演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM