简体   繁体   English

如何使用 python 仅搜索分隔文本文件的第一列

[英]How to search through only the first column of a delimited text file using python

Search through the first column of a piped '|'搜索管道“|”的第一列delimited.txt file containing 10 million rows using python.包含 1000 万行的 delimited.txt 文件,使用 python。 The first column contains phone number.第一列包含电话号码。 I would like to output the entire row for that phone number我想 output 该电话号码的整行

The file is 5GB.txt file, I am unable to open it in either ms excel or ms access.该文件是 5GB.txt 文件,我无法在 ms excel 或 ms 访问中打开它。 So i want to write a python code that can search through the file and print out the entire row which matches a particular phone number.所以我想编写一个 python 代码,它可以搜索文件并打印出与特定电话号码匹配的整行。 Phone number is in the first column.电话号码在第一列。 I wrote a code but it searches the entire file and is very slow.我写了一个代码,但它搜索整个文件并且非常慢。 I just want to search the first column and my search item is the phone number.我只想搜索第一列,我的搜索项是电话号码。

f = open("F:/.../master.txt","rt")      # open file master.txt
for line in f:                      # check each line in the file handle f
 if '999995555' in line:           # if a particular phone number is found
   print(line)                   # print the entire row
f.close()                            # close file

I expect the entire row to be printed on screen where the first column contains the phone number i am searching.我希望将整行打印在屏幕上,其中第一列包含我正在搜索的电话号码。 but it is taking a lot of time as I am unable to search for the column as I don t know the code.但这需要很多时间,因为我不知道代码,无法搜索该列。

Well you are on the correct track there.那么你在正确的轨道上。 Since it is a 5GB file, you probably want to avoid allocating 5GB of RAM for this.由于它是一个 5GB 的文件,您可能希望避免为此分配 5GB 的 RAM。 You probably better of using .readline() , since it is design for exactly your scenario (a big file).您可能最好使用.readline() ,因为它是专为您的场景(一个大文件)而设计的。

Something like the following should do the trick, note that .readline() will return '' for the end of the file and '\n' for empty lines.像下面这样的东西应该可以解决问题,请注意.readline()将返回''文件末尾和'\n'空行。 The .strip() call is merely to remove the '\n' that .readline() returns at the end of each line actually in the file. .strip()调用只是删除.readline()在文件中实际每行末尾返回的'\n'

def search_file_line_prefix(path, search_prefix):
    with open(path, 'r') as file_handle:
        while (True):
            line = file_handle.readline()
            if line == '':
                break
            if line.startswith(search_prefix):
                yield line.strip()

for result in search_file_line_prefix('file_path', 'phone number'):
    print(result)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM