讀取一個文件中的行並找到所有以另一個txt文件中列出的4個字母開頭的字符串

Question

我有2個txt文件（a和b_）。

file_a.txt包含一長串4個字母的組合（每行一個組合）：

aaaa
bcsg
aacd
gdee
aadw
hwer
etc.

file_b.txt包含各種長度的字母組合的列表（有些帶有空格）：

aaaibjkes
aaleoslk
abaaaalkjel
bcsgiweyoieotpwe
csseiolskj
gaelsi asdas
aaaloiersaaageehikjaaa
hwesdaaadf wiibhuehu
bcspwiopiejowih
gdeaes
aaailoiuwegoiglkjaaake
etc.

我正在尋找一個允許我執行以下操作的python腳本：

逐行讀取file_a.txt
接受每個4字母的組合（例如aaai）
讀取file_b.txt並找到所有以4個字母開頭的各種長度的字母組合（例如aaai bjkes， aaai loiersaaageehikjaaa， aaa loiuwegoiglkjaaaike等）
將每個搜索的結果打印在以4個字母組成的單獨txt文件中。

文件aaai.txt：

aaaibjkes 
aaailoiersaaageehikjaaa
aaailoiuwegoiglkjaaake
etc.

文件bcsi.txt：

bcspwiopiejowih
bcsiweyoieotpwe
etc.

對不起，我是新手。 有人能指出我正確的方向嗎？ 到目前為止，我只有：

#I presume I will have to use regex at some point
import re

file1 = open('file_a.txt', 'r').readlines()
file2 = open('file_b.txt', 'r').readlines()

#Should I look into findall()?

Answer 1

希望對您有所幫助；

file1 = open('file_a.txt', 'r')
file2 = open('file_b.txt', 'r')

#get every item in your second file into a list 
mylist = file2.readlines()

# read each line in the first file
while file1.readline():
    searchStr = file1.readline()
    # find this line in your second file
    exists = [s for s in mylist if searchStr in s]
    if (exists):
        # if this line exists in your second file then create a file for it
        fileNew = open(searchStr,'w')
        for line in exists:
            fileNew.write(line)

        fileNew.close()

    file1.close()

Answer 2

您可以做的是打開兩個文件，並使用for循環逐行運行兩個文件。

您可以有兩個for循環，第一個讀取file_a.txt因為您將只讀取一次。 第二個將通讀file_b.txt並在開頭查找字符串。

為此，您將必須使用.find()來搜索字符串。 由於它是開頭，因此該值應為0 。

file_a = open("file_a.txt", "r")
file_b = open("file_b.txt", "r")

for a_line in file_a:
    # This result value will be written into your new file
    result = ""
    # This is what we will search with
    search_val = a_line.strip("\n")
    print "---- Using " + search_val + " from file_a to search. ----"
    for b_line in file_b:
        print "Searching file_b using " + b_line.strip("\n")
        if b_line.strip("\n").find(search_val) == 0:
            result += (b_line)
    print "---- Search ended ----"
    # Set the read pointer to the start of the file again
    file_b.seek(0, 0)

    if result:
        # Write the contents of "results" into a file with the name of "search_val"
        with open(search_val + ".txt", "a") as f:
            f.write(result)

file_a.close()
file_b.close()

測試用例：

我在您的問題中使用測試用例：

file_a.txt

aaaa
bcsg
aacd
gdee
aadw
hwer

file_b.txt

aaaibjkes
aaleoslk
abaaaalkjel
bcsgiweyoieotpwe
csseiolskj
gaelsi asdas
aaaloiersaaageehikjaaa
hwesdaaadf wiibhuehu
bcspwiopiejowih
gdeaes
aaailoiuwegoiglkjaaake

該程序產生一個輸出文件bcsg.txt ，因為它被認為與bcsgiweyoieotpwe內部。

Answer 3

嘗試這個：

f1 = open("a.txt","r").readlines()
f2 = open("b.txt","r").readlines()
file1 = [word.replace("\n","") for word in f1]
file2 = [word.replace("\n","") for word in f2]

data = []
data_dict ={}
for short_word in file1:
    data += ([[short_word,w] for w in file2 if w.startswith(short_word)])

for single_data in data:
    if single_data[0] in data_dict:
        data_dict[single_data[0]].append(single_data[1])
    else:
        data_dict[single_data[0]]=[single_data[1]]

for key,val in data_dict.iteritems():
    open(key+".txt","w").writelines("\n".join(val))
    print(key + ".txt created")

讀取一個文件中的行並找到所有以另一個txt文件中列出的4個字母開頭的字符串

問題描述

3 個解決方案

解決方案1
1 2016-05-30 11:05:50

解決方案2
0 已采納 2016-05-30 10:51:35

解決方案3
0 2016-05-30 11:29:57

讀取一個文件中的行並找到所有以另一個txt文件中列出的4個字母開頭的字符串

問題描述

3 個解決方案

解決方案1 1 2016-05-30 11:05:50

解決方案2 0 已采納 2016-05-30 10:51:35

解決方案3 0 2016-05-30 11:29:57

解決方案1
1 2016-05-30 11:05:50

解決方案2
0 已采納 2016-05-30 10:51:35

解決方案3
0 2016-05-30 11:29:57