繁体   English   中英

从文件替换字符串中的多个单词

[英]Replacing multiple words in a string from a file

对于这个项目,我们得到了一个文本文件,如下所示:

r:are
y:why
u:you
ttyl:talk to you later
l8:late
brb:be right back
lol:laughing out loud
bbl:be back later
...etc...

我的想法是制作一个程序,将句子从文本语音转换为普通语音,我使用了.replace方法,但是它给了我不明白的结果。

我在使用Python版本3.4.0的Windows 8上

这是我当前的代码:

def main():
    sentence={}
    sentence=input("enter a sentence to translate\n")
    slang_file = open('slang.txt', 'r')
    for line in slang_file:
        slangword,unslang=line.split(":")
        if slangword in sentence:
            sentence = sentence.replace(slangword, unslang)
    print(sentence)
main()

这是我的输出:

>>> 
enter a sentence to translate
here r some problems. wuts wrong
heare
e are
some pare
oblems. wyou
ts ware
ong
>>> 

任何帮助或指针将是很好。

这个想法是检测整个单词。
当前代码的问题是您要替换单词中的字母; 这是你不想做的事..
由于我不是python专家,因此您可以改善代码。

def main():
    sentence={}
    sentence=input("enter a sentence to translate\n")
    slang_file = open('slang.txt', 'r')
    for line in slang_file:
        slangword,unslang=line.strip().split(":")
        if slangword in sentence.split(" "):
            sentence = sentence.replace(slangword+" ", unslang+" ")
            sentence = sentence.replace(" "+slangword, " "+unslang)
    print(sentence)
main()
deslang = {}
with open('slang.txt', 'r') as f:
    for line in f:
        slang, unslang = line.strip().split(':')
        deslang[slang] = unslang

sentence = input('Enter sentence to translate: ')
for word in deslang:
    sentence.replace(word, deslang[word])
print(sentence)

Enter sentence to translate: y r u l8?
why are you late?

基本问题是:

1. you should split the stencence before replace operation, otherwise it may
use part of stencence which not you want.
2. str.replace will replace all word in str that satisfy your condition

例如,当在代码中执行“ r”替换时,原始单词为:

here r some problem.

将替换其中的所有“ r”,并更改为:

heare are some pareoblem

解决方案很简单,如下所示:

def main():
    sentence=input("enter a sentence to translate\n")
    slang_dict = {}
    slang_file = open('slang.txt', 'r')

    for line in slang_file:
        slangword,unslang=line.split(":")
        slang_dict[slangword] = unslang

    result = ""
    for item in sentence.split():
        if item in slang_dict.keys():
            result += slang_dict[item]
        else:
            result += item
        result += " "
    print result

还有一些小问题:

1. don't define stencence with {} as that means stencence is dict,
while it's actuall string.
2. use local dict to store mapping in slang.txt, as it may be repeated used 
and it's waste of time to read file each time

如果您要进行任何类型的自然语言处理,则尽早学习re模块非常有用:

import re

def main():
    slang_file = [line.strip().split(":") for line in open("slang.txt")]
    slang = {k:v for k, v in slang_file}
    sentence = input("enter a sentence to translate\n")
    print(
        re.sub(r"\w+", lambda m: slang.get(m.group(0), m.group(0)), sentence)
    )

main()

这里详细解释:

def main():
    # open the input file
    slang_file = open("slang.txt")

    # using a normal list instead of list comprehension
    tmp_list = []

    # the built-in iter method will give you each line
    for line in slang_file:

        # strip the line of linefeeds, carriage returns and spaces
        line = line.strip()

        # split the line in two parts and save to our list
        tmp_list.append(line.split(":"))

    # add each item to a dictionary
    slang = {}

    # key is what you want to find
    # value is what you want to replace it with
    for key, value in tmp_list:
        slang[key] = value

    # get the sentence to translate
    sentence = input("enter a sentence to translate\n")

    #in regular expression \w matches any letter or number
    #\w+ matches any consecutive combination of letters or numbers

    # the second argument is normally a replace statement
    # however this is where the lambda function is helpful
    # m takes the match object for \w+
    # the matched text is retrieved by m.group()
    # which we then use as a key for the slang dictionary to get the replacement
    # the second m.group() is there to be returned when the key is not in slang
    print(
        re.sub(r"\w+", lambda m: slang.get(m.group(), m.group()), sentence)
    )

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM