简体   繁体   English

需要在 Python 中将字符串与文件中的行匹配

[英]Need to match string to line in file in Python

my first time asking something here.我第一次在这里问东西。 I have a text file of names, 1 name per line, that i'm reading in to a list, and then duplicating that list twice, the first time to remove \\n's and a second time to lowercase the list.我有一个名称文本文件,每行 1 个名称,我正在读入列表,然后将该列表复制两次,第一次删除 \\n,第二次将列表小写。 then I ask the user for a search term, and convert their input to lowercase, and then search the lowercase version of the list for that, then i get the index of the match, and use that to display the non-lowercase version of that list item back to the user (so they can type for example anivia and get back Anivia).然后我询问用户搜索词,并将他们的输入转换为小写,然后搜索列表的小写版本,然后我得到匹配的索引,并使用它来显示非小写版本将列表项返回给用户(因此他们可以输入例如 anivia 并返回 Anivia)。 This is working fine but I'm sure my code is pretty bad.这工作正常,但我确定我的代码很糟糕。 What I would like to do is add specific abreviations for some of the names in the list file, and accept those abbreviations as input, but still display back the full name.我想做的是为列表文件中的某些名称添加特定的缩写,并接受这些缩写作为输入,但仍显示全名。 For example, user enters "mumu" and it sees that the list has Amumu - mumu, to reference Amumu.例如,用户输入“mumu”,它看到列表中有Amumu - mumu,以引用Amumu。 How could I go about accepting that abreviation?我怎么能接受这个缩写呢? Also other cases like mf for Miss Fortune or kha for Kha'Zix.还有其他案例,例如财富小姐的 mf 或卡兹克的 kha。 I thought of maybe having a second file contain the list of abbreviations but that seems so wasteful and I'm sure theres a better way.我想也许有一个包含缩写列表的第二个文件,但这似乎太浪费了,我相信有更好的方法。 Here is my bad code so far:到目前为止,这是我的错误代码:

f = open("champions.txt") #open the file
list = f.readlines() #load each line into the list
#print list
list2 = [item.rstrip('\n') for item in list] #remove trailing newlines in copy list
list3 = [item.lower() for item in list2] #make it all lowercase

print "-------\n", list2 #print the list with no newlines just to check

print "which champ" #ask user for input
value = raw_input().lower() #get the input and make it lowercase
if value in list3: #check list for value and then print back a message using that index but from the non-lowercase list
    pos = list3.index(value)
    print "list contains", list2[pos]
else: #if the input doesn't match an item print an error message
    print "error"

ill put this all into some function in my main file once it's working the way i need.一旦它按我需要的方式工作,我会将所有这些都放入我的主文件中的某个函数中。 Basically I want to change some of the lines in my text file to have valid alternate abreviations and be able to accept those in this and still display the full name back to the user.基本上,我想更改我的文本文件中的某些行以具有有效的备用缩写,并且能够接受其中的那些并且仍然向用户显示全名。 For example, one of the lines in my secondary text file that has the abbreviations has a line of:例如,我的辅助文本文件中具有缩写的行之一有一行:

Kog'Maw - kogmaw, kog, km

how can i simplify what I have and add that functionality?我怎样才能简化我所拥有的并添加该功能? I'm not really sure where to start, I'm pretty new to python and programming in general.我不太确定从哪里开始,我对 Python 和一般编程还是很陌生。 Thank you for any help you can provide, sorry for such a long post.感谢您提供的任何帮助,抱歉这么长的帖子。

OK, here's a revised answer that assumes there's one file containing names and abbreviations as shown at the beginning of this .好的,这是一个修订后的答案,假设有一个包含名称和缩写的文件,如.

Essentially what it does is make a large lookup table that maps any abbreviation in the file plus the name itself in lowercase to the name at the beginning of each line.本质上,它所做的是制作一个大型查找表,将文件中的任何缩写加上小写的名称本身映射到每行开头的名称。

lookup = {}
with open("champions.txt") as f:
    for line in f:
        line = line.rstrip().split('-', 1)
        if not line: continue # skip any blank lines

        name = line[0].strip()
        lookup[name.lower()] = name
        if len(line) == 2:  # any alternative names given?
            for item in line[1].split(','):
                lookup[item.strip()] = name

print 'lookup table:'
for alt_name, real_name in sorted(lookup.items()):
    print '{}: {}'.format(alt_name, real_name)
print

while True:
    print "which champ (Enter to quit): "  # ask user for input
    value = raw_input().lower()  # get the input and make it lowercase
    if not value: break

    real_name = lookup.get(value)
    if real_name:
        print 'found:', value, '-->', real_name
    else:
        print 'error: no match for', value

First, you should use useful names.首先,您应该使用有用的名称。 So, instead of list2 call it lower_names , etc.所以,而不是list2称它为lower_names等。

Second, you could replace the in operator and index call by just one index call.其次,你可以更换in运营商和index仅由一个呼叫index的呼叫。 If you noticed, a call to some_list.index(item_which_does_not_exist) will raise a valueError saying that the item is not in the list.如果您注意到,对some_list.index(item_which_does_not_exist)的调用将引发valueError表示该项目不在列表中。 The most "pythonic" way to do this is by try ing to get the index, except if it fails, then you would do something else.最“pythonic”的方法是try获取索引, except它失败,否则你会做其他事情。

So you could replace the if part by this:所以你可以用这个替换if部分:

try:
    pos = list3.index(value)
except ValueError:
    print 'error'
else:
    print 'everything is ok. there was no exception raised'
    print 'list contains', list2[pos]

It is often said in the python philosophy that it is better to ask for forgiveness than for permission.在蟒蛇哲学中经常说,请求宽恕比请求许可更好。 :) :)

Another important thing, and this is only assuming that you want to match a lowercase name to its "real" name, you need here a dictionary.另一件重要的事情,这只是假设您想将小写名称与其“真实”名称相匹配,您在这里需要一个字典。 A dictionary maps a key to a value, so what you have here is a case where you want each lowercase name (key) to map to the real name (value).字典将一个键映射到一个值,所以这里的情况是您希望每个小写名称(键)映射到真实名称(值)。 It can be defined this way (I see you are familiar with one liners):它可以这样定义(我看到你熟悉一个班轮):

name_map = {item.lower(): item for item in (line.strip() for line in f)}

So, instead of using readlines , you can directly loop through the file.因此,您可以直接循环遍历文件,而不是使用readlines That's extra sugar in Python.这是 Python 中的额外糖分。

And then, you can do something like: value in name_map or real_name = name_map[value] .然后,您可以执行以下操作: value in name_mapreal_name = name_map[value]

As for the extra functionality, I'd go for your second option, which is name - nickname1,nickname2 .至于额外的功能,我会选择你的第二个选项,即name - nickname1,nickname2 So what you need to do is: read each line, split by the dash - (or whatever other character that will not be used in the names), then split the second part by the commas to have each name alone.所以你需要做的是:阅读每一行,用破折号分隔- (或任何其他不会在名称中使用的字符),然后用逗号分隔第二部分以单独使用每个名称。 Wrapping up:包起来:

name_map = {}
nick_map = {}
for line in f:
    parts = line.strip().split('-')
    name = parts[0].strip()
    nicks = {n.strip(): name for n in parts[1].split(',')}
    name_map[name.lower()] = name
    nick_map.update(nicks)

# To check a name:
if value in name_map:
   # Exact match
elif value in nick_map:
   # Nickname match
else:
   # No Match

You could do the equivalent with try/except/else clauses, but that will make too much nesting, which is not recommended.你可以用 try/except/else 子句做等效的事情,但这会造成太多嵌套,这是不推荐的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM