简体   繁体   English

标记同一目录下多个文本文件中的所有英文单词

[英]Tag all English words in multiple text files in same directory

I am trying to modify the code to apply to multiple text files in the same directory.我正在尝试修改代码以应用于同一目录中的多个文本文件。 The code looks as follows but there is an error "NameError: name 'output' is not defined".代码如下所示,但出现错误“NameError: name 'output' is not defined”。 Can you help me to suggest improvements to the code?你能帮我提出改进代码的建议吗?

import re

def replaceenglishwords(filename):
    mark_pattern = re.compile("\\*CHI:.*")
    word_pattern = re.compile("([A-Za-z]+)")

    for line in filename:
    # Split into possible words
        parts = line.split()

        if mark_pattern.match(parts[0]) is None:
            output.write()
            continue

        # Got a CHI line
        new_line = line
        for word in parts[1:]:
            matches = word_pattern.match(word)
            if matches:
                old = f"\\b{word}\\b"
                new = f"{matches.group(1)}@s:eng"
                new_line = re.sub(old, new, new_line, count=1)
            output.write(new_line)

import glob
for file in glob.glob('*.txt'):
    outfile = open(file.replace('.txt', '-out.txt'), 'w', encoding='utf8')
    for line in open(file, encoding='utf8'):
        print(replaceenglishwords(line), '\n', end='', file=outfile)
    outfile.close()

replaceenglishwords needs two parameters, one for the file you are searching and one for the file where you write you results: replaceenglishwords(filename, output) . replaceenglishwords需要两个参数,一个用于您正在搜索的文件,另一个用于您写入结果的文件: replaceenglishwords(filename, output) It looks like your function is reading the input file line by line by itself.看起来您的函数正在逐行读取输入文件。

Now you can open both files in your loop and pass them to replaceenglishwords :现在您可以在循环中打开这两个文件并将它们传递给replaceenglishwords

for file in glob.glob('*.txt'):
    textfile = open(file, encoding='utf8')
    outfile = open(file.replace('.txt', '-out.txt'), 'w', encoding='utf8')
    replaceenglishwords(textfile, outfile)
    textfile.close()
    outfile.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 仅在以 *CHI 开头的行上用 @s:eng 标记文本文件中的所有英文单词: - Tag all English words in text files with @s:eng only on lines starting with *CHI: 目录中所有文件中的文本替换 - Text replacement in all the files in a directory 使用Perl统计文件中或目录中所有文件中所有单词的出现次数 - Use Perl to count occurrences of all words in a file or in all files in a directory Powershell-将所有.psd文件收集在同一目录中 - Powershell - gather all .psd files in the same directory 如何忽略所有包含重音的单词(非英语单词)? - How to ignore all words containing accents(Non English words)? 我可以同时在崇高文字3中找到并替换多个单词吗? - Can I find and replace multiple words at the same time in sublime text 3? 用tkinter选择多个文本文件后,如何同时打开和操作它们? - How to open and manipulate multiple text files all at the same time after selecting them with tkinter? 在多个文本文件中查找与正则表达式匹配的单词 - Finding words which match regex in multiple text files git匹配多个单词的标签 - git match tag with multiple words 如何删除R中的所有英语单词(特殊标点除外) - how to delete all English words, except special punctuation, in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM