简体   繁体   English

如何在 Python 脚本中查找文本文件中的字符串并每次替换为用户输入?

[英]How to Find a String in a Text File And Replace Each Time With User Input in a Python Script?

I am new to python so excuse my ignorance.我是 python 的新手,所以请原谅我的无知。

Currently, I have a text file with some words marked as <>.目前,我有一个文本文件,其中一些单词标记为 <>。

My goal is to essentially build a script which runs through a text file with such marked words.我的目标本质上是构建一个脚本,该脚本通过带有这些标记词的文本文件运行。 Each time the script finds such a word, it would ask the user for what it wants to replace it with.每次脚本找到这样的单词时,它都会询问用户它想用什么替换它。

For example, if I had a text file:例如,如果我有一个文本文件:

Today was a <<feeling>> day.

The script would run through the text file so the output would be:该脚本将通过文本文件运行,因此 output 将是:

Running script...
feeling? great
Script finished.

And generate a text file which would say:并生成一个文本文件,上面写着:

Today was a great day.

Advice?建议?

Edit: Thanks for the great advice.编辑:谢谢你的好建议。 I have made a script that works for the most part like I wanted.我制作了一个脚本,大部分时间都像我想要的那样工作。 Just one thing, Now I am working on if I have multiple variables with the same name (for instance. "I am <>. Bob is also <>,") the script would only prompt?只是一件事,如果我有多个具有相同名称的变量(例如。“我是 <>。Bob 也是 <>”),脚本只会提示? feeling,.感觉,。 once and fill in all the variables with the same name.一次并填写所有具有相同名称的变量。

Thanks so much for your help again.再次感谢您的帮助。

import re
with open('in.txt') as infile:
    text = infile.read()
search = re.compile('<<([^>]*)>>')
text = search.sub(lambda m: raw_input(m.group(1) + '? '), text)
with open('out.txt', 'w') as outfile:
    outfile.write(text)

To open afile and loop through it:打开一个文件并循环遍历它:

Use raw_input to get input from user使用raw_input从用户那里获取输入

Now, put this together and update you question if you run into problems:-)现在,将这些放在一起并在遇到问题时更新您的问题:-)

I understand you want advice on how to structure your script, right?我知道您需要有关如何构建脚本的建议,对吗? Here's what I would do:这是我要做的:

  1. Read the file at once and close it (I personally don't like to have open file objects, especially if my filesystem is remote).立即读取文件并关闭它(我个人不喜欢打开文件对象,特别是如果我的文件系统是远程的)。
  2. Use a regular expression (phihag has suggested one in his answer , so I won't repeat it) to match the pattern of your placeholders.使用正则表达式(phihag 在他的回答中建议了一个,所以我不会重复它)来匹配您的占位符的模式。 Find all of your placeholders and store them in a dictionary as keys.找到所有占位符并将它们作为键存储在字典中。
  3. For each word in the dictionary, ask the user with raw_input (not just input ).对于字典中的每个单词,使用raw_input (不仅仅是input )询问用户。 And store them as values in the dictionary.并将它们作为值存储在字典中。
  4. When done, parse your text substituting any instance of a given placeholder (key) with the user word (value).完成后,解析您的文本,用用户词(值)替换给定占位符(键)的任何实例 This is also done with regex .这也可以通过regex完成。

The reason for using a dictionary is that a given placeholder could occur more than once and you probably don't want to make the user repeat the entry over and over again...使用字典的原因是给定的占位符可能会出现多次,您可能不想让用户一遍又一遍地重复输入......

Basically the same solution as that offerred by @phihag, but in script form与@phihag 提供的解决方案基本相同,但采用脚本形式

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import argparse
import re
from os import path

pattern = '<<([^>]*)>>'

def user_replace(match):
    return raw_input('%s? ' % match.group(1))


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('infile', type=argparse.FileType('r'))
    parser.add_argument('outfile', type=argparse.FileType('w'))
    args = parser.parse_args()

    matcher = re.compile(pattern)

    for line in args.infile:
        new_line = matcher.sub(user_replace, line)
        args.outfile.write(new_line)

    args.infile.close()
    args.outfile.close()

if __name__ == '__main__':
    main()

Usage: python script.py input.txt output.txt用法: python script.py input.txt output.txt

Note that this script does not account for non-ascii file encoding.请注意,此脚本不考虑非 ascii 文件编码。

Try something like this尝试这样的事情

lines = []
with open(myfile, "r") as infile:
    lines = infile.readlines()

outlines = []
for line in lines:
    index = line.find("<<")
    if index > 0:
        word = line[index+2:line.find(">>")]
        input = raw_input(word+"? ")
        outlines.append(line.replace("<<"+word+">>", input))
    else:
        outlines.append(line)

with open(outfile, "w") as output:
    for line in outlines:
        outfile.write(line)

Disclaimer: I haven't actually run this, so it might not work, but it looks about right and is similar to something I've done in the past.免责声明:我实际上并没有运行它,所以它可能无法正常工作,但它看起来很正确,并且与我过去所做的类似。

How it works:这个怎么运作:

  • It parses the file in as a list where each element is one line of the file.它将文件解析为一个列表,其中每个元素都是文件的一行。
  • It builds the output list of lines.它构建 output 行列表。 It iterates through the lines in the input, checking if the string << exist.它遍历输入中的行,检查字符串<<是否存在。 If it does, it rips out the word inside the << and >> brackets, using it as the question for a raw_input query.如果是这样,它会删除<<>>括号内的单词,将其用作raw_input查询的问题。 It takes the input from that query and replaces the value inside the arrows (and the arrows) with the input.它从该查询中获取输入,并将箭头(和箭头)内的值替换为输入。 It then appends this value to the list.然后它将这个值附加到列表中。 If it didn't see the arrows it simply appended the line.如果它没有看到箭头,它只是附加了该行。
  • After running through all the lines, it writes them to the output file.在运行完所有行之后,它将它们写入 output 文件。 You can make this whatever file you want.您可以制作任何您想要的文件。

Some issues:一些问题:

  1. As written, this will work for only one arrow statement per line.如所写,这仅适用于每行一个箭头语句。 So if you had <<firstname>> <<lastname>> on the same line it would ignore the lastname portion.因此,如果您在同一行有<<firstname>> <<lastname>>它将忽略 lastname 部分。 Fixing this wouldn't be too hard to implement - you could place a while loop using the index > 0 statement and holding the lines inside that if statement.解决这个问题并不难实现 - 您可以使用index > 0语句放置一个while循环,并将这些行保留在该 if 语句中。 Just remember to update the index again if you do that!如果您这样做,请记住再次更新索引!
  2. It iterates through the list three times.它遍历列表三次。 You could likely reduce this to two, but if you have a small text file this shouldn't be a huge problem.您可能会将其减少到两个,但如果您有一个小文本文件,这应该不是一个大问题。
  3. It could be sensitive to encoding - I'm not entirely sure about that however.它可能对编码很敏感-但是我对此并不完全确定。 Worst case there you need to cast as a string.在最坏的情况下,您需要将其转换为字符串。

Edit: Moved the +2 to fix the broken if statement.编辑:移动+2以修复损坏的 if 语句。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM