简体   繁体   English

比较多个txt文件中的行

[英]Comparing the lines in multiple txt files

I am writing a Python program to find and duplicate lines in txt files from a folder.我正在编写一个 Python 程序来从文件夹中查找和复制 txt 文件中的行。

My folder structure is我的文件夹结构是

f1--> review.txt
f2--> review.txt
f3--> review.txt

and so on ( f1 represent folder name)依此类推( f1代表文件夹名称)

I want to find which line came again in another txt file for example if file "f1/review.txt" first line is I want to eat an apple then in which all other file I want to eat an apple came again.我想在另一个 txt 文件中找到哪一行再次出现,例如,如果文件“f1/review.txt”的第一行是I want to eat an apple ,那么我I want to eat an apple一个苹果的所有其他文件又出现在哪个文件中。 I want a more efficient way to do it.我想要一种更有效的方法来做到这一点。 I am writing a lot of loops to do it and it is getting bigger我正在写很多循环来做到这一点,而且它越来越大

My approach till now (for two files)到目前为止我的方法(对于两个文件)

for root,dirs,files in os.walk('root'):
        for file in files:

            with open(os.path.join(root,file), "r") as auto:
                if file == "review.txt":
                    lines=auto.readlines()
                    for line in lines:
                            f=open("root/f1/review.txt","r")
                            src_lines=f.readlines()

                            for src_line in src_lines:


           src_sent=find_error(src_line,src_line_num+':    ')
                                   curr_sent=find_error(line,curr_file_num+':    ')


if src_sent==curr_sent:
       res.append([line_num])

EDIT (can Please format above code)编辑(可以请格式化上面的代码)

txt file content if it helps in any way): txt 文件内容(如果有任何帮助):

classes/CadenceMyProfileController1.cls:6:    Avoid really long classes (lines of code)
/data/public/pmd/repo/src/1/src/classes/CadenceMyProfileController1.cls:6:    Missing ApexDoc comment
/data/public/pmd/repo/src/1/src/classes/CadenceMyProfileController1.cls:6:    The class 'CadenceMyProfileController1' has a Standard Cyclomatic Complexity of 2 (Highest = 174).
/data/public/pmd/repo/src/1/src/classes/CadenceMyProfileController1.cls:6:    The class 'CadenceMyProfileController1' has a total cyclomatic complexity of 422 (highest 215).
/data/public/pmd/repo/src/1/src/classes/CadenceMyProfileController1.cls:6:    This class has too many public methods and attributes
/data/public/pmd/repo/src/1/src/classes/CadenceMyProfileController1.cls:7:    Avoid really long classes (lines of code)

Don't know if it really is shorter than yours but you have to touch every line as you want to check if they are different so we have to go through all the lines.不知道它是否真的比你的短,但你必须触摸每一行,因为你想检查它们是否不同,所以我们必须通过所有的行 go。 In my solution it checks every line of a file with all lines of others and changes the line to "Changed" .在我的解决方案中,它检查文件的每一行以及其他所有行,并将该行更改为"Changed"

But be careful with the invisible "\n" as this solution says they are different when one line is the last line in a file and the other is somewhere in the middle.但是要小心不可见的"\n" ,因为这个解决方案说当一行是文件的最后一行而另一行在中间的某个地方时它们是不同的。 If you need this to be stated as equal I can change that.如果您需要将其声明为平等,我可以更改它。 Hope it helps in some kind of way:希望它以某种方式有所帮助:

def check_and_edit(file_path, path_list):
    with open(file_path, "r") as f:
        lines = f.readlines()
    for path in path_list:
        with open(path, "r+") as f:
            target_lines = f.readlines()
            changes = False
            for i, target in enumerate(target_lines):
                if target in lines:
                    target_lines[i] = "Changed \n"
                    changes = True
            if changes:
                f.truncate(0)
                f.seek(0, 0)
                f.writelines(target_lines)

check_and_edit(file_path, path_list)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM