简体   繁体   English

搜索文件中的定界字符串-Python

[英]Search a delimited string in a file - Python

I have the following read.json file 我有以下read.json文件

{:{"JOL":"EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD.tr","LAPTOP":"error"}

and python script : 和python脚本:

import re

shakes = open("read.json", "r")
needed = open("needed.txt", "w")
for text in shakes:
    if re.search('JOL":"(.+?).tr', text):
         print >> needed, text,

I want it to find what's between two words (JOL":" and .tr) and then print it. 我希望它找到两个单词(JOL“:”和.tr)之间的内容,然后打印出来。 But all it does is printing all the text set in "read.json". 但它所做的只是打印“ read.json”中设置的所有文本。

You're calling re.search , but you're not doing anything with the returned match, except to check that there is one. 您正在调用re.search ,但是对返回的匹配项不做任何事情,只是要检查是否存在匹配项。 Instead, you're just printing out the original text . 相反,您只是打印出原始text So of course you get the whole line. 因此,您当然可以掌握全部内容。

The solution is simple: just store the result of re.search in a variable, so you can use it. 解决方案很简单:只需将re.search的结果存储在一个变量中,即可使用它。 For example: 例如:

for text in shakes:
    match = re.search('JOL":"(.+?).tr', text)
    if match:
        print >> needed, match.group(1)

In your example, the match is JOL":"EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD.tr , and the first (and only) group in it is EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD , which is (I think) what you're looking for. 在示例中,匹配是JOL":"EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD.tr ,并且第一(只)组中它是EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD ,这是(我想)您正在寻找的东西。

However, a couple of side notes: 但是,有几点注意事项:

First, . 首先, . is a special pattern in a regex, so you're actually matching anything up to any character followed by tr , not .tr . 是正则表达式中的一种特殊模式,因此您实际上是在匹配任何字符,然后匹配tr而不是.tr For that, escape the . 为此,请逃脱. with a \\ . 带有\\ (And, once you start putting backslashes into a regex, use a raw string literal.) So: r'JOL":"(.+?)\\.tr' . (而且,一旦开始将反斜杠放入正则表达式中,请使用原始字符串文字。)因此: r'JOL":"(.+?)\\.tr'

Second, this is making a lot of assumptions about the data that probably aren't warranted. 其次,这对可能不需要的数据做出了许多假设。 What you really want here is not "everything between JOL":" and .tr ", it's "the value associated with key 'JOL' in the JSON object". 您真正想要的不是“ JOL":" ”中的所有内容JOL":".tr ”,而是“与JSON对象中的键'JOL'相关的值”。 The only problem is that this isn't quite a JSON object, because of that prefixed : . 唯一的问题是,这不是一个JSON对象,因为带有前缀: Hopefully you know where you got the data from, and therefore what format it's actually in. For example, if you know it's actually a sequence of colon-prefixed JSON objects, the right way to parse it is: 希望您知道从何处获取数据,因此知道数据的实际格式。例如,如果您知道数据实际上是冒号前缀的JSON对象序列,则解析数据的正确方法是:

d = json.loads(text[1:])
if 'JOL' in d:
    print >> needed, d['JOL']

Finally, you don't actually have anything named needed in your code; 最后,您的代码中实际上没有needed任何名称; you opened a file named 'needed.txt' , but you called the file object love . 您打开了名为'needed.txt'的文件,但将文件对象称为love If your real code has a similar bug, it's possible that you're overwriting some completely different file over and over, and then looking in needed.txt and seeing nothing changed each time… 如果您的真实代码有类似的错误,则可能是您一遍又一遍地覆盖了一些完全不同的文件,然后查看了needed.txt文件,每次都看不到任何变化。

If you know that your starting and ending matching strings only appear once, you can ignore that it's JSON. 如果您知道起始和结束匹配字符串仅出现一次,则可以忽略它是JSON。 If that's OK, then you can split on the starting characters (JOL":"), take the 2nd element of the split array [1], then split again on the ending characters (.tr) and take the 1st element of the split array [0]. 如果可以,则可以分割开始字符(JOL“:”),采用分割数组[1]的第二个元素,然后再次分割结束字符(.tr),采用分割的第一个元素数组[0]。

>>> text = '{:{"JOL":"EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD.tr","LAPTOP":"error"}'

>>> text.split('JOL":"')[1].split('.tr')[0]

'EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM