简体   繁体   English

使用python查找单词的等级

[英]Finding the rating of words using python

This is my program and it display the value if i give the complete name like if i type eng than it will show me only eng with value 这是我的程序,如果我输入完整的名称(例如输入eng ,它将显示值,它将仅显示具有值的eng

import re
sent = "eng"
#sent=raw_input("Enter word")
#regex = re.compile('(^|\W)sent(?=(\W|$))')
for line in open("sir_try.txt").readlines():
    if sent == line.split()[0].strip():
        k = line.rsplit(',',1)[0].strip()
        print k
gene name        utr length
ensbta                  24
ensg1                   12
ensg24                  30
ensg37                  65
enscat                  22
ensm                    30

Actually what i want to do is that i want to search the highest value from the text file not through words , and it delete all the values from the text file of the same word having less value than the maximum like from the above text it should delete 12 , 30 for ensg , and than it should find the minimum value from the utr values and display it with name What you people answering me is , i already done it , and i mention it before my showing my program 实际上,我想做的是我想highest value from the text file搜索highest value from the text file不是通过单词搜索,它会删除同一单词的文本文件中所有值的值都小于最大值的字符,就像上面的文本一样删除12 , 30对ensg,而且比it should find the minimum value from the utr values and display it with name是什么你的人回答我的是,我已经做到了,我显示我的节目之前我提到它

Try instead of if sent == and replace it with a if sent in (line.split()[0].strip()): 尝试代替if sent ==并用if sent in (line.split()[0].strip()):替换它if sent in (line.split()[0].strip()):

That should check if the value of sent (engs) is anywhere in the argument (line.split()[0].strip()) in this case. 在这种情况下,应该检查send(engs)的值是否在参数(line.split()[0] .strip())中的任何位置。

If you're still trying to only take the highest value, I would just create a variable value, then something along the lines of 如果您仍在尝试仅取最高值,那么我将创建一个变量值,然后按照以下步骤进行操作:

if line.split()[1].strip() > value:
    value = line.split()[1].strip()

Test that out and let us know how it works for you. 测试一下,让我们知道它如何为您工作。

please try this 请尝试这个

file=open("sir_try.txt","r")
list_line=file.readlines()
file.close()
all_text=""

dic={}
sent="ensg"
temp_list=[]
for line in list_line:
    all_text=all_text+line
    name= line.rsplit()[0].strip()
    score=line.rsplit()[1].strip()
    dic[name]=score
for i in dic.keys():
    if sent in i:
        temp_list.append(dic[i])
hiegh_score=max(temp_list)

def check(index):
    reverse_text=all_text[index+1::-1]
    index2=reverse_text.find("\n")
    if sent==reverse_text[:index2+1][::-1][1:len(sent)+1]:
        return False
    else:
        return True

list_to_min=dic.values()
for i in temp_list:
    if i!=hiegh_score:
        index=all_text.find(str(i))
        while check(index):
            index=all_text.find(str(i),index+len(str(i)))
        all_text=all_text[0:index]+all_text[index+len(str(i)):]
        list_to_min.remove(str(i))
#write all text to "sir_try.txt"
file2=open("sir_try.txt","w")
file2.write(all_text)
file2.close()
min_score= min(list_to_min)
for j in dic.keys():
    if min_score==dic[j]:
        print "min score is :"+str(min_score)+" for person "+j

function check is for a bug in solotion for explain when your file is 功能检查是针对solotion中的错误的,以解释文件何时为

gene name        utr length
ali                     12
ali87                   30
ensbta                  24
ensg1                   12
ensg24                  30
ensg37                  65
enscat                  22
ensm                    30

program delete ali score but we dont have it 程序删除ali分数,但我们没有它
by adding check function i solve it 通过添加检查功能,我解决了
and this version is final version answer 这个版本是最终版本答案

import operator
f = open('./sir_try.txt', 'r')
f = f.readlines()
del f[0]

gene = {}
matched_gene = {}

for line in f:
    words = line.strip().split(' ')
    words = [word for word in words if not word == '']
    gene[words[0]] = words[1]

# getting user input
user_input = raw_input('Enter gene name: ')
for gene_name, utr_length in gene.iteritems():
    if user_input in gene_name:
        matched_gene[gene_name] = utr_length
m = max(matched_gene.iteritems(), key=operator.itemgetter(1))[0]
print m, matched_gene[m]  # expected answer

# code to remove redundant gene names as per requirement

for key in matched_gene.keys():
    if not key == m:
        matched_gene.pop(key)
for key in gene.keys():
    if user_input in key:
        gene.pop(key)

final_gene = dict(gene.items() + matched_gene.items())
out = open('./output.txt', 'w')
out.write('gene name' + '\t\t' + 'utr length' + '\n\n')
for key, value in final_gene.iteritems():
    out.write(key + '\t\t\t\t' + value + '\n')
out.close()

Output: 输出:

Enter gene name: ensg
ensg37 65

To find out the name (first column) with the maximum value associated (second column), you need to first split the lines at the whitespace between name and value. 要找出具有关联最大值的名称(第一列)(第二列),您需要首先在名称和值之间的空白处分割行。 Then you can find the maximum value using the built-in max() function. 然后,您可以使用内置的max()函数找到最大值。 Let it take the value column as sorting criterion. 让它以值列为排序标准。 You can then easily find out the corresponding name. 然后,您可以轻松地找到相应的名称。

Example: 例:

file_content = """
gene name        utr length
ensbta                  24
ensg1                   12
ensg24                  30
ensg37                  65
enscat                  22
ensm                    30
"""

# split lines at whitespace
l = [line.split() for line in file_content.splitlines()]

# skip headline and empty lines
l = [line for line in l if len(line) == 2]

print l

# find the maximum of second column
max_utr_length_tuple = max(l, key=lambda x:x[1])

print max_utr_length_tuple

print max_utr_length_tuple[0]

the output is: 输出为:

$ python test.py
[['ensbta', '24'], ['ensg1', '12'], ['ensg24', '30'], ['ensg37', '65'], ['enscat', '22'], ['ensm', '30']]
['ensg37', '65'] 
ensg37

Short and sweet: 简短而甜美:

In [01]: t=file_content.split()[4:]
In [02]: b=((zip(t[0::2], t[1::2])))
In [03]: max(b, key=lambda x:x[1])
Out[03]: ('ensg37', '65')

Since you have tagged your question , 由于您已标记问题
Here's something that you would want to see and it's the only one (at the moment) that uses regex! 这是您想看到的东西,它是目前唯一使用正则表达式的东西!

import re

sent = 'ensg' # your sequence
# regex that will "filter" the lines containing value of sent  
my_re = re.compile(r'(.*?%s.*?)\s+?(\d+)' % sent)

with open('stack.txt') as f:
    lines = f.read() # get data from file

filtered = my_re.findall(lines) # "filter" your data
print filtered

# get the desired (tuple with maximum "utr length")
max_tuple = max(filtered, key=lambda x: x[1]) 
print max_tuple

Output: 输出:

[('ensg1', '12'), ('ensg24', '30'), ('ensg37', '65')]
('ensg37', '65')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM