简体   繁体   English

比较两个文件并从第二个文件中删除单词 Python

[英]Compare two files and remove the words from the second file Python

I'm trying to compare two files and get the difference using a function.我正在尝试比较两个文件并使用 function 获得差异。

The first file contains English words - one after the other (engwrds.txt) and the second file is a text file of web scraped text (ws.txt).第一个文件包含英文单词 - 一个接一个(engwrds.txt),第二个文件是 web 抓取文本(ws.txt)的文本文件。 What I want to achieve is to compare the two files and remove the words from ws.txt and write them to a different file.我想要实现的是比较这两个文件并从 ws.txt 中删除单词并将它们写入不同的文件。

In the web scraped file, there are words and sentences.在web抓取的文件中,有单词和句子。 But in the other file, the words are placed one after the other.但是在另一个文件中,单词是一个接一个地放置的。

I tried the following code but it creates a blank output file.我尝试了以下代码,但它创建了一个空白 output 文件。

with open('ws.txt', 'r', encoding='utf-8') as file1:
    with open('engwrds.txt', 'r', encoding='utf-8') as file2:
        same = set(file1).intersection(file2)

same.discard('\n')

with open('output_file.txt', 'w', encoding='utf-8') as file_out:
    for line in same:
        file_out.write(line)

Then I tried this one, which doesn't print any output at all.然后我尝试了这个,它根本不打印任何 output。

from pathlib import Path

with open('engwrds.txt', 'r', encoding='utf-8') as fin:
    exclude = set(line.rstrip() for line in fin)

with fileinput.input('ws.txt', inplace=True) as f:
    for line in f:
        if not exclude.intersection(Path(line.rstrip()).parts):
            print(line, end='')

The following code also doesn't print any output.以下代码也不会打印任何 output。

with open('op11-Copy1.txt', 'r') as file1:
    with open('commonwords.txt', 'r') as file2:
        dif = set(file1).difference(file2)
        
dif.discard('\n')
        
with open('diff.txt', 'w') as file_out:
    for line in dif:
        file_out.write(line)

Can you please explain the mistakes I'm making here?你能解释一下我在这里犯的错误吗? I referred multiple examples like this , this .我提到了多个这样的例子这个 But I can't figure out the issue.但我无法弄清楚这个问题。 Ideally, I want to come up with a function that achieves this task.理想情况下,我想提出一个 function 来完成这项任务。

This is what the ws.txt file looks like.这就是 ws.txt 文件的样子。
在此处输入图像描述

This is what the engwrds.txt looks like.这就是 engwrds.txt 的样子。
在此处输入图像描述

The output file looks like this. output 文件如下所示。
在此处输入图像描述

Just open your files in different variables and compare them.只需以不同的变量打开文件并进行比较。 For Example:例如:

Suppose that the file ws.txt (scraped file) contains:假设文件 ws.txt(抓取的文件)包含:

your world is beautiful你的世界很美

And the file engwrds.txt contains these words (one after the other):并且文件 engwrds.txt 包含这些词(一个接一个):

while world want wild而世界想要狂野

Open each one in a different variable:在不同的变量中打开每一个:

with open('engwrds.txt', 'r', encoding='utf-8') as file:
    engwrds = file.read()

with open('ws.txt', 'r', encoding='utf-8') as file:
    ws = file.read()

From here engwrds and ws are strings , so you can compare them in many different ways:从这里engwrdswsstrings ,所以你可以用许多不同的方式比较它们:

differences = set(engwrds.split()).symmetric_difference(set(ws.split()))
print(differences)

Output: {'beautiful', 'is', 'want', 'while', 'wild', 'your'}

Obviously, this comparison only works if your words are separated by spaces, but from here you will have a better idea of how to solve the problem.显然,这种比较仅在您的单词用空格分隔时才有效,但从这里您将更好地了解如何解决问题。

I suggest you go through this answer Compare two different files line by line in python我建议您通过此答案go 在 python 中逐行比较两个不同的文件

Wanted to add this as a comment, but I was not able to.想将此添加为评论,但我无法做到。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 比较两个文件并在Python中更新第二个文件中第一个文件的值的最佳方法是什么? - What's the best way to compare two files & update the values of the first file from second file in Python? 比较两个文件并在python中找到匹配的单词 - compare two file and find matching words in python 在Python中,如何根据一列中的值比较两个csv文件并从第一个文件中输出与第二个不匹配的记录 - In Python, how to compare two csv files based on values in one column and output records from first file that do not match second Python 比较单词并删除两个列表中的重复项 - Python compare words and remove duplicates in two list of lists 比较两个ini文件的键,并将匹配键的值复制到python中的第二个ini文件 - Compare keys of two ini files and copy the values of matched key's to second ini file in python 比较两个文本文件,替换第一个文件中包含第二个文件中的字符串的行 - Compare two text files, replace lines in first file that contain a string from lines in second file 如何比较 Python 中两个文本文件中的单个单词 - How to compare individual words in two text files in Python 同时迭代两个文件,并将单词与字符串PYTHON进行比较 - Iterating two files at same time, and compare words with strings PYTHON 比较python中两个列表中的单词 - compare words in two lists in python Python Regex-从文件中删除包含“:”的单词 - Python Regex - remove words containing “:” from file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM