[英]Read lines in one file and find all strings starting with 4-letter strings listed in another txt file
I have 2 txt files (a and b_). 我有2个txt文件(a和b_)。
file_a.txt contains a long list of 4-letter combinations (one combination per line): file_a.txt包含一长串4个字母的组合(每行一个组合):
aaaa
bcsg
aacd
gdee
aadw
hwer
etc.
file_b.txt contains a list of letter combinations of various length (some with spaces): file_b.txt包含各种长度的字母组合的列表(有些带有空格):
aaaibjkes
aaleoslk
abaaaalkjel
bcsgiweyoieotpwe
csseiolskj
gaelsi asdas
aaaloiersaaageehikjaaa
hwesdaaadf wiibhuehu
bcspwiopiejowih
gdeaes
aaailoiuwegoiglkjaaake
etc.
I am looking for a python script that would allow me to do the following: 我正在寻找一个允许我执行以下操作的python脚本:
File aaai.txt: 文件aaai.txt:
aaaibjkes
aaailoiersaaageehikjaaa
aaailoiuwegoiglkjaaake
etc.
File bcsi.txt: 文件bcsi.txt:
bcspwiopiejowih
bcsiweyoieotpwe
etc.
I'm sorry I'm a newbie. 对不起,我是新手。 Can someone point me in the right direction, please. 有人能指出我正确的方向吗? So far I've got only: 到目前为止,我只有:
#I presume I will have to use regex at some point
import re
file1 = open('file_a.txt', 'r').readlines()
file2 = open('file_b.txt', 'r').readlines()
#Should I look into findall()?
I hope this would help you; 希望对您有所帮助;
file1 = open('file_a.txt', 'r')
file2 = open('file_b.txt', 'r')
#get every item in your second file into a list
mylist = file2.readlines()
# read each line in the first file
while file1.readline():
searchStr = file1.readline()
# find this line in your second file
exists = [s for s in mylist if searchStr in s]
if (exists):
# if this line exists in your second file then create a file for it
fileNew = open(searchStr,'w')
for line in exists:
fileNew.write(line)
fileNew.close()
file1.close()
What you can do is to open both files and run both files down line by line using for
loops. 您可以做的是打开两个文件,并使用for
循环逐行运行两个文件。
You can have two for
loops, the first one reading file_a.txt
as you will be reading through it only once. 您可以有两个for
循环,第一个读取file_a.txt
因为您将只读取一次。 The second will read through file_b.txt
and look for the string at the start. 第二个将通读file_b.txt
并在开头查找字符串。
To do so, you will have to use .find()
to search for the string. 为此,您将必须使用.find()
来搜索字符串。 Since it is at the start, the value should be 0
. 由于它是开头,因此该值应为0
。
file_a = open("file_a.txt", "r")
file_b = open("file_b.txt", "r")
for a_line in file_a:
# This result value will be written into your new file
result = ""
# This is what we will search with
search_val = a_line.strip("\n")
print "---- Using " + search_val + " from file_a to search. ----"
for b_line in file_b:
print "Searching file_b using " + b_line.strip("\n")
if b_line.strip("\n").find(search_val) == 0:
result += (b_line)
print "---- Search ended ----"
# Set the read pointer to the start of the file again
file_b.seek(0, 0)
if result:
# Write the contents of "results" into a file with the name of "search_val"
with open(search_val + ".txt", "a") as f:
f.write(result)
file_a.close()
file_b.close()
Test Cases: 测试用例:
I am using the test cases in your question: 我在您的问题中使用测试用例:
file_a.txt file_a.txt
aaaa
bcsg
aacd
gdee
aadw
hwer
file_b.txt file_b.txt
aaaibjkes
aaleoslk
abaaaalkjel
bcsgiweyoieotpwe
csseiolskj
gaelsi asdas
aaaloiersaaageehikjaaa
hwesdaaadf wiibhuehu
bcspwiopiejowih
gdeaes
aaailoiuwegoiglkjaaake
The program produces an output file bcsg.txt
as it is supposed to with bcsgiweyoieotpwe
inside. 该程序产生一个输出文件bcsg.txt
,因为它被认为与bcsgiweyoieotpwe
内部。
Try this: 尝试这个:
f1 = open("a.txt","r").readlines()
f2 = open("b.txt","r").readlines()
file1 = [word.replace("\n","") for word in f1]
file2 = [word.replace("\n","") for word in f2]
data = []
data_dict ={}
for short_word in file1:
data += ([[short_word,w] for w in file2 if w.startswith(short_word)])
for single_data in data:
if single_data[0] in data_dict:
data_dict[single_data[0]].append(single_data[1])
else:
data_dict[single_data[0]]=[single_data[1]]
for key,val in data_dict.iteritems():
open(key+".txt","w").writelines("\n".join(val))
print(key + ".txt created")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.