简体   繁体   English

在文本文件中找不到字符串

[英]Can't find string in text file

Given a list of item numbers, I am trying to search through a text file with a list of recent item numbers, and identify any in this recent list. 给定项目编号列表,我试图搜索带有最近项目编号列表的文本文件,并在最近的列表中标识任何项目编号。 I then want to add any items that weren't already in the recent list. 然后我想添加最近列表中尚未存在的任何项目。

My code is below, it just doesn't seem to be finding anything in the text file. 我的代码如下,它似乎没有在文本文件中找到任何内容。 Why isn't it working? 为什么不工作?

def filter_recent_items(items):
    recentitems = []
    with open('last 600 items.txt', 'r+') as f:
        for item in items:
            if item['ID'] in f:
                print 'In! --', item['ID']
            else:
                recentitems.append(item['ID'])
                print 'Out ---', item['ID']
        for item in recentitems:
            f.write("%s\n" % item)


items = [ {'ID': 1}, {'ID': 'test2'} ]     
filter_recent_items(items)

For example , my text file is: 例如,我的文本文件是:

test2

test1

1

but the above code returns 但上面的代码返回

Out --- 1
Out --- test2

The problem is in how you're checking for the existence of the specified text. 问题在于您如何检查指定文本是否存在。 In your code f is a file object, used for reading and writing to/from a file. 在您的代码中, f是一个文件对象,用于读取和写入文件。 So when you check if 所以,当你检查是否

str in f

It's not checking what you think it is. 它没有检查你的想法。 (See below for details.) (详见下文。)

Instead, you need to read in the lines of the file and then iterate through those lines and check for necessary string. 相反,您需要读取文件的行,然后遍历这些行并检查必要的字符串。 Ex. 防爆。

with open('last 600 items.txt', 'r+') as f:
    lines = f.readlines()
    for l in lines:
        # check within each line for the presence of the items

In the above code exerpt, f.readlines() uses the file object to read the contents of the file and returns a list of strings, which are the lines within the file. 在上面的代码中, f.readlines()使用文件对象来读取文件的内容并返回字符串列表,这些字符串是文件中的行。

EDITED (credit to Peter Wood) 编辑 (相信彼得伍德

Python Membership Details Python会员详细信息

In Python, when you use the syntax x in y , it checks for 2 things: 在Python中,当您x in y使用语法x in y ,它会检查两件事:

Case 1: It first checks to see whether y has a __contains__(b) method. 情况1:它首先检查y是否具有__contains__(b)方法。 If so, it returns the result of y.__contains__(x) . 如果是,则返回y.__contains__(x)

Case 2: If however, y does not have a __contains__ method, but does define the __iter__ method, Python instead uses that method to iterate over the contents of y and returns True if at any point one of the values being iterated over equals x . 案例2:如果然而, y 没有 __contains__方法,但确实定义__iter__方法,巨蟒而是使用方法上的内容重复y并返回True如果在任何点中的一个值上迭代等于x Otherwise, it returns False . 否则,它返回False

If we use your code as the example, at a certain point, it is checking the truth of the statement "test2" in f . 如果我们使用您的代码作为示例,在某一点上,它正在检查"test2" in f语句"test2" in f的真实性。 Here f is an object of type file . 这里ffile类型的对象。 ( Python File Object Description ). Python文件对象描述 )。 File objects belong to Case 2 (ie they don't have __contains__ , they do have __iter__ . 文件对象属于第2种情况(即他们没有 __contains__ ,他们确实__iter__

So the code will go through each line and see whether your input strings are equal to any of the lines in the file. 因此代码将遍历每一行,看看您的输入字符串是否等于文件中的任何行。 And since each line ends with the char \\n , your strings are never going to return True . 并且因为每一行都以char \\n结尾,所以你的字符串永远不会返回True

To elaborate, while "test2" in "test2\\n" would return True , the test that's actually being performed here is: "test2" == "test2\\n" , which is False . 详细说来, "test2" in "test2\\n"将返回True ,这里实际执行的测试是: "test2" == "test2\\n" ,这是False

You can test how this works on your file by hand. 您可以手动测试其在文件中的工作方式。 For exmaple, if we want to see if "test2" in f should return True : 例如,如果我们想查看"test2" in f是否应该返回True

with open(filename) as f:
    x = iter(f)
    while(True):
        try:
            line = x.next()
        except:
            break
        print(line)
        print(line == "test2")

You'll notice that it prints out each line (including the newline at the end) and that the result of line == "test2" is always False . 您会注意到它打印出每一行(包括末尾的换行符),并且line == "test2"的结果始终为False

If however we were to try: "test2\\n" in f , the result would be True . 但是,如果我们尝试: "test2\\n" in f ,结果将为True

End Edit 结束编辑

As others have said, if "somestring" in f will always fail. 正如其他人所说, if "somestring" in f总是会失败。 f is a file object which, when you iterate over it, produces lines of text. f是一个文件对象,当您遍历它时,会生成一行文本。 One or more of those LINES might contain your text, so instead you could do: 这些LINES中的一个或多个可能包含您的文本,因此您可以执行以下操作:

if any("targetstring" in line for line in f):
    # success

This is memory-saving versus the f.read() or f.readlines() approaches, which both stream the whole file into memory before doing anything. f.read()f.readlines()方法相比,这节省了内存,它们在执行任何操作之前都将整个文件流式传输到内存中。

@PeterWood points out in the comments that some of your target strings aren't actually strings. @PeterWood在评论中指出,你的一些目标字符串实际上并不是字符串。 You should see to that, too. 你也应该看到这一点。 all(isinstance(item["ID"], str) for item in items) should be True . all(isinstance(item["ID"], str) for item in items)应为True

Print out your data store, f . 打印出你的数据存储, f First of all, I expect that you have embedded newline characters that prevent the items from matching: "1" doesn't match "1\\n". 首先,我希望你有嵌入的换行符,以防止项匹配:“1”与“1 \\ n”不匹配。 Second, note that **with open" gives you a generator, not a list or tuple. You can't scan the list multiple times. You don't have the data from it until you iterate through it somehow. 其次,注意**打开“给你一个生成器,而不是列表或元组。你不能多次扫描列表。在你以某种方式迭代它之前,你没有它的数据。

You need code to get all the elements into memory, such as 您需要代码将所有元素都存入内存,例如

content = f.read().split("\n")
for item in items:
    if item["ID" in content:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM