简体   繁体   English

Python使用来自另一个文件的输入搜索文件以查找文本

[英]Python search a file for text using input from another file

I'm new to python and programming. 我是python和编程的新手。 I need some help with a python script. 我需要一些python脚本的帮助。 There are two files each containing email addresses (more than 5000 lines). 有两个文件,每个文件包含电子邮件地址(超过5000行)。 Input file contains email addresses that I want to search in the data file(also contains email addresses). 输入文件包含我要在数据文件中搜索的电子邮件地址(也包含电子邮件地址)。 Then I want to print the output to a file or display on the console. 然后我想将输出打印到控制台上的文件或显示。 I search for scripts and was able to modify but I'm not getting the desired results. 我搜索脚本并能够修改,但我没有得到所需的结果。 Can you please help me? 你能帮我么?

dfile1 (50K lines)
yyy@aaa.com
xxx@aaa.com
zzz@aaa.com


ifile1 (10K lines)
ccc@aaa.com
vvv@aaa.com
xxx@aaa.com
zzz@aaa.com

Output file
xxx@aaa.com
zzz@aaa.com



datafile = 'C:\\Python27\\scripts\\dfile1.txt'
inputfile = 'C:\\Python27\\scripts\\ifile1.txt'

with open(inputfile, 'r') as f:
names = f.readlines()

outputlist = []

with open(datafile, 'r') as fd:
  for line in fd:
    name = fd.readline()
    if name[1:-1] in names:
        outputlist.append(line)
    else:
        print "Nothing found"
 print outputlist

New Code 新规范

with open(inputfile, 'r') as f:
    names = f.readlines()
outputlist = []

with open(datafile, 'r') as f:
    for line in f:
        name = f.readlines()
        if name in names:
            outputlist.append(line)
        else:
            print "Nothing found"
    print outputlist

mitan8 gives the problem you have, but this is what I would do instead: mitan8给出了你的问题,但这就是我要做的事情:

with open(inputfile, "r") as f:
    names = set(i.strip() for i in f)

output = []

with open(datafile, "r") as f:
    for name in f:
        if name.strip() in names:
            print name

This avoids reading the larger datafile into memory. 这样可以避免将较大的数据文件读入内存。

If you want to write to an output file, you could do this for the second with statement: 如果你想要写一个输出文件,你可以为第二个做到这一点with语句:

with open(datafile, "r") as i, open(outputfile, "w") as o:
    for name in i:
        if name.strip() in names:
            o.write(name)

Maybe I'm missing something, but why not use a pair of sets? 也许我错过了什么,但为什么不使用一套?

#!/usr/local/cpython-3.3/bin/python

data_filename = 'dfile1.txt'
input_filename = 'ifile1.txt'

with open(input_filename, 'r') as input_file:
    input_addresses = set(email_address.rstrip() for email_address in input_file.readlines())

with open(data_filename, 'r') as data_file:
    data_addresses = set(email_address.rstrip() for email_address in data_file.readlines())

print(input_addresses.intersection(data_addresses))

I think you can remove name = fd.readline() since you've already got the line in the for loop. 我想你可以删除name = fd.readline()因为你已经在for循环中获得了这一行。 It'll read another line in addition to the for loop, which reads one line every time. 除了for循环之外,它还会读取另一行,每次读取一行。 Also, I think name[1:-1] should be name , since you don't want to strip the first and last character when searching. 另外,我认为name[1:-1]应该是name ,因为您不想在搜索时去掉第一个和最后一个字符。 with automatically closes the files opened. with自动关闭打开的文件。

PS : How I'd do it: PS :我是怎么做到的:

with open("dfile1") as dfile, open("ifile") as ifile:
    lines = "\n".join(set(dfile.read().splitlines()) & set(ifile.read().splitlines())
print(lines)
with open("ofile", "w") as ofile:
    ofile.write(lines)

In the above solution, basically I'm taking the union (elements part of both sets) of the lines of both the files to find the common lines. 在上面的解决方案中,基本上我采用两个文件的行的联合(两个元素的一部分)来找到公共线。

I think your issue stems from the following: 我认为你的问题源于以下几点:

name = fd.readline()
if name[1:-1] in names:

name[1:-1] slices each email address so that you skip the first and last characters. name[1:-1]会对每个电子邮件地址进行切片,以便跳过第一个和最后一个字符。 While it might be good in general to skip the last character (a newline '\\n' ), when you load the name database in the "dfile" 虽然跳过最后一个字符(换行符'\\n' )通常很好,但在“dfile”中加载名称数据库时

with open(inputfile, 'r') as f:
    names = f.readlines()

you are including newlines. 你包括换行符。 So, don't slice the names in the "ifile" at all, ie 所以,不要将“ifile”中的名称切片,即

if name in names:

Here's what I would do: 这就是我要做的事情:

names=[]
outputList=[]
with open(inputfile) as f:
    for line in f:
        names.append(line.rstrip("\n")

myEmails=set(names)

with open(outputfile) as fd, open("emails.txt", "w") as output:
    for line in fd:
        for name in names:
            c=line.rstrip("\n")
            if name in myEmails:
                print name #for console
                output.write(name) #for writing to file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM