简体   繁体   English

读取一个文件中的行并找到所有以另一个txt文件中列出的4个字母开头的字符串

[英]Read lines in one file and find all strings starting with 4-letter strings listed in another txt file

I have 2 txt files (a and b_). 我有2个txt文件(a和b_)。

file_a.txt contains a long list of 4-letter combinations (one combination per line): file_a.txt包含一长串4个字母的组合(每行一个组合):

aaaa
bcsg
aacd
gdee
aadw
hwer
etc.

file_b.txt contains a list of letter combinations of various length (some with spaces): file_b.txt包含各种长度的字母组合的列表(有些带有空格):

aaaibjkes
aaleoslk
abaaaalkjel
bcsgiweyoieotpwe
csseiolskj
gaelsi asdas
aaaloiersaaageehikjaaa
hwesdaaadf wiibhuehu
bcspwiopiejowih
gdeaes
aaailoiuwegoiglkjaaake
etc.

I am looking for a python script that would allow me to do the following: 我正在寻找一个允许我执行以下操作的python脚本:

  1. read file_a.txt line by line 逐行读取file_a.txt
  2. take each 4-letter combination (eg aaai) 接受每个4字母的组合(例如aaai)
  3. read file_b.txt and find all the various-length letter combinations starting with the 4-letter combination (eg. aaai bjkes, aaai loiersaaageehikjaaa, aaai loiuwegoiglkjaaaike etc.) 读取file_b.txt并找到所有以4个字母开头的各种长度的字母组合(例如aaai bjkes, aaai loiersaaageehikjaaa, aaa loiuwegoiglkjaaaike等)
  4. print the results of each search in a separate txt file named with the 4-letter combination. 将每个搜索的结果打印在以4个字母组成的单独txt文件中。

File aaai.txt: 文件aaai.txt:

aaaibjkes 
aaailoiersaaageehikjaaa
aaailoiuwegoiglkjaaake
etc.

File bcsi.txt: 文件bcsi.txt:

bcspwiopiejowih
bcsiweyoieotpwe
etc.

I'm sorry I'm a newbie. 对不起,我是新手。 Can someone point me in the right direction, please. 有人能指出我正确的方向吗? So far I've got only: 到目前为止,我只有:

#I presume I will have to use regex at some point
import re

file1 = open('file_a.txt', 'r').readlines()
file2 = open('file_b.txt', 'r').readlines()

#Should I look into findall()?

I hope this would help you; 希望对您有所帮助;

file1 = open('file_a.txt', 'r')
file2 = open('file_b.txt', 'r')

#get every item in your second file into a list 
mylist = file2.readlines()

# read each line in the first file
while file1.readline():
    searchStr = file1.readline()
    # find this line in your second file
    exists = [s for s in mylist if searchStr in s]
    if (exists):
        # if this line exists in your second file then create a file for it
        fileNew = open(searchStr,'w')
        for line in exists:
            fileNew.write(line)

        fileNew.close()

    file1.close()

What you can do is to open both files and run both files down line by line using for loops. 您可以做的是打开两个文件,并使用for循环逐行运行两个文件。

You can have two for loops, the first one reading file_a.txt as you will be reading through it only once. 您可以有两个for循环,第一个读取file_a.txt因为您将只读取一次。 The second will read through file_b.txt and look for the string at the start. 第二个将通读file_b.txt并在开头查找字符串。

To do so, you will have to use .find() to search for the string. 为此,您将必须使用.find()来搜索字符串。 Since it is at the start, the value should be 0 . 由于它是开头,因此该值应为0

file_a = open("file_a.txt", "r")
file_b = open("file_b.txt", "r")

for a_line in file_a:
    # This result value will be written into your new file
    result = ""
    # This is what we will search with
    search_val = a_line.strip("\n")
    print "---- Using " + search_val + " from file_a to search. ----"
    for b_line in file_b:
        print "Searching file_b using " + b_line.strip("\n")
        if b_line.strip("\n").find(search_val) == 0:
            result += (b_line)
    print "---- Search ended ----"
    # Set the read pointer to the start of the file again
    file_b.seek(0, 0)

    if result:
        # Write the contents of "results" into a file with the name of "search_val"
        with open(search_val + ".txt", "a") as f:
            f.write(result)

file_a.close()
file_b.close()

Test Cases: 测试用例:

I am using the test cases in your question: 我在您的问题中使用测试用例:

file_a.txt file_a.txt

aaaa
bcsg
aacd
gdee
aadw
hwer

file_b.txt file_b.txt

aaaibjkes
aaleoslk
abaaaalkjel
bcsgiweyoieotpwe
csseiolskj
gaelsi asdas
aaaloiersaaageehikjaaa
hwesdaaadf wiibhuehu
bcspwiopiejowih
gdeaes
aaailoiuwegoiglkjaaake

The program produces an output file bcsg.txt as it is supposed to with bcsgiweyoieotpwe inside. 该程序产生一个输出文件bcsg.txt ,因为它被认为与bcsgiweyoieotpwe内部。

Try this: 尝试这个:

f1 = open("a.txt","r").readlines()
f2 = open("b.txt","r").readlines()
file1 = [word.replace("\n","") for word in f1]
file2 = [word.replace("\n","") for word in f2]

data = []
data_dict ={}
for short_word in file1:
    data += ([[short_word,w] for w in file2 if w.startswith(short_word)])

for single_data in data:
    if single_data[0] in data_dict:
        data_dict[single_data[0]].append(single_data[1])
    else:
        data_dict[single_data[0]]=[single_data[1]]

for key,val in data_dict.iteritems():
    open(key+".txt","w").writelines("\n".join(val))
    print(key + ".txt created")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从另一个文本文件中搜索一个文件中列出的字符串? - Search for strings listed in one file from another text file? 在另一个 txt 文件中搜索一个 txt 文件中的字符串 - Searching for strings in one txt file in another txt file 将 .txt 文件中的单行读取为字符串 - Reading Individual Lines in a .txt File as Strings 使用Python Pandas读取.txt文件-字符串和浮点数 - Read .txt file with Python Pandas - strings and floats python 解析日志文件:在不同的行中找到两个特定的字符串并连接到一个并写入另一个文件! 避免空行 - python parse log file: find two specific strings in different lines and concatenate in one and write to another file! Avoiding blank lines 如何使用python根据某些字符串删除TXT文件中的某些行,将文件内容复制到另一个文件 - how to copy content of file to another file with deleting some lines in a TXT file based on some strings with python 如何在 python 中将字符串从一个 txt 文件替换为另一个 txt 文件 - How to replace strings from one txt file to another txt file, in python 在txt文件中搜索后查找字符串 - Find strings after search in txt file 在txt文件中搜索字符串 - Searching for strings in txt file 从txt读取行。 文件,并将除第6行外的所有行写在另一行上,然后刷新第一个文件 - Read lines from a txt. file and write all except 6 first lines on another, then flush the first file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM