读取一个文件中的行并找到所有以另一个txt文件中列出的4个字母开头的字符串

Question

I have 2 txt files (a and b_). 我有2个txt文件（a和b_）。

file_a.txt contains a long list of 4-letter combinations (one combination per line): file_a.txt包含一长串4个字母的组合（每行一个组合）：

aaaa
bcsg
aacd
gdee
aadw
hwer
etc.

file_b.txt contains a list of letter combinations of various length (some with spaces): file_b.txt包含各种长度的字母组合的列表（有些带有空格）：

aaaibjkes
aaleoslk
abaaaalkjel
bcsgiweyoieotpwe
csseiolskj
gaelsi asdas
aaaloiersaaageehikjaaa
hwesdaaadf wiibhuehu
bcspwiopiejowih
gdeaes
aaailoiuwegoiglkjaaake
etc.

I am looking for a python script that would allow me to do the following: 我正在寻找一个允许我执行以下操作的python脚本：

read file_a.txt line by line 逐行读取file_a.txt
take each 4-letter combination (eg aaai) 接受每个4字母的组合（例如aaai）
read file_b.txt and find all the various-length letter combinations starting with the 4-letter combination (eg. aaai bjkes, aaai loiersaaageehikjaaa, aaai loiuwegoiglkjaaaike etc.) 读取file_b.txt并找到所有以4个字母开头的各种长度的字母组合（例如aaai bjkes， aaai loiersaaageehikjaaa， aaa loiuwegoiglkjaaaike等）
print the results of each search in a separate txt file named with the 4-letter combination. 将每个搜索的结果打印在以4个字母组成的单独txt文件中。

File aaai.txt: 文件aaai.txt：

aaaibjkes 
aaailoiersaaageehikjaaa
aaailoiuwegoiglkjaaake
etc.

File bcsi.txt: 文件bcsi.txt：

bcspwiopiejowih
bcsiweyoieotpwe
etc.

I'm sorry I'm a newbie. 对不起，我是新手。 Can someone point me in the right direction, please. 有人能指出我正确的方向吗？ So far I've got only: 到目前为止，我只有：

#I presume I will have to use regex at some point
import re

file1 = open('file_a.txt', 'r').readlines()
file2 = open('file_b.txt', 'r').readlines()

#Should I look into findall()?

Answer 1

I hope this would help you; 希望对您有所帮助；

file1 = open('file_a.txt', 'r')
file2 = open('file_b.txt', 'r')

#get every item in your second file into a list 
mylist = file2.readlines()

# read each line in the first file
while file1.readline():
    searchStr = file1.readline()
    # find this line in your second file
    exists = [s for s in mylist if searchStr in s]
    if (exists):
        # if this line exists in your second file then create a file for it
        fileNew = open(searchStr,'w')
        for line in exists:
            fileNew.write(line)

        fileNew.close()

    file1.close()

Answer 2

What you can do is to open both files and run both files down line by line using for loops. 您可以做的是打开两个文件，并使用for循环逐行运行两个文件。

You can have two for loops, the first one reading file_a.txt as you will be reading through it only once. 您可以有两个for循环，第一个读取file_a.txt因为您将只读取一次。 The second will read through file_b.txt and look for the string at the start. 第二个将通读file_b.txt并在开头查找字符串。

To do so, you will have to use .find() to search for the string. 为此，您将必须使用.find()来搜索字符串。 Since it is at the start, the value should be 0 . 由于它是开头，因此该值应为0 。

file_a = open("file_a.txt", "r")
file_b = open("file_b.txt", "r")

for a_line in file_a:
    # This result value will be written into your new file
    result = ""
    # This is what we will search with
    search_val = a_line.strip("\n")
    print "---- Using " + search_val + " from file_a to search. ----"
    for b_line in file_b:
        print "Searching file_b using " + b_line.strip("\n")
        if b_line.strip("\n").find(search_val) == 0:
            result += (b_line)
    print "---- Search ended ----"
    # Set the read pointer to the start of the file again
    file_b.seek(0, 0)

    if result:
        # Write the contents of "results" into a file with the name of "search_val"
        with open(search_val + ".txt", "a") as f:
            f.write(result)

file_a.close()
file_b.close()

Test Cases: 测试用例：

I am using the test cases in your question: 我在您的问题中使用测试用例：

file_a.txt file_a.txt

aaaa
bcsg
aacd
gdee
aadw
hwer

file_b.txt file_b.txt

aaaibjkes
aaleoslk
abaaaalkjel
bcsgiweyoieotpwe
csseiolskj
gaelsi asdas
aaaloiersaaageehikjaaa
hwesdaaadf wiibhuehu
bcspwiopiejowih
gdeaes
aaailoiuwegoiglkjaaake

The program produces an output file bcsg.txt as it is supposed to with bcsgiweyoieotpwe inside. 该程序产生一个输出文件bcsg.txt ，因为它被认为与bcsgiweyoieotpwe内部。

Answer 3

Try this: 尝试这个：

f1 = open("a.txt","r").readlines()
f2 = open("b.txt","r").readlines()
file1 = [word.replace("\n","") for word in f1]
file2 = [word.replace("\n","") for word in f2]

data = []
data_dict ={}
for short_word in file1:
    data += ([[short_word,w] for w in file2 if w.startswith(short_word)])

for single_data in data:
    if single_data[0] in data_dict:
        data_dict[single_data[0]].append(single_data[1])
    else:
        data_dict[single_data[0]]=[single_data[1]]

for key,val in data_dict.iteritems():
    open(key+".txt","w").writelines("\n".join(val))
    print(key + ".txt created")

读取一个文件中的行并找到所有以另一个txt文件中列出的4个字母开头的字符串

问题描述

3 个解决方案

解决方案1
1 2016-05-30 11:05:50

解决方案2
0 已采纳 2016-05-30 10:51:35

解决方案3
0 2016-05-30 11:29:57

读取一个文件中的行并找到所有以另一个txt文件中列出的4个字母开头的字符串

问题描述

3 个解决方案

解决方案1 1 2016-05-30 11:05:50

解决方案2 0 已采纳 2016-05-30 10:51:35

解决方案3 0 2016-05-30 11:29:57

解决方案1
1 2016-05-30 11:05:50

解决方案2
0 已采纳 2016-05-30 10:51:35

解决方案3
0 2016-05-30 11:29:57