简体   繁体   English

如何从 tsv 文件中删除低值

[英]How to remove low values from a tsv file

I have this TSV file:我有这个 TSV 文件:

    kind        10
    men        9
    number        8
    animated        7
    blade        6
    jolly        5
    manage        4
    move         3
    complete        2
    meat        1

And I would like to remove all words which have less than the number 5 next to them.我想删除所有小于数字 5 的单词。

So only:所以只有:

    Output: 
    kind        10
    men        9
    number        8
    animated        7
    blade        6
    jolly        5

I would like to do this as python code.我想将其作为 python 代码执行。 I was thinking maybe I could load it in a list and from there look at the number, and if it's less than 5 it's removed.我在想也许我可以将它加载到一个列表中,然后从那里查看数字,如果它小于 5,它就会被删除。 But how to do that I'm not sure.但如何做到这一点我不确定。

Something like this:像这样的东西:

    new_file = open(the_file,encoding="utf-8")
    data = new_file.readlines()
    new_list = []
    for values in data:
        if values > 5:
            new_list.append(values)









   

Welcome to the community.欢迎来到社区。 As others have suggested, you may very well use pandas.正如其他人所建议的那样,您可以很好地使用 pandas。 If you want to use the csv module, you can do something like the following:如果要使用 csv 模块,可以执行以下操作:

import csv

tsv_file = open("example.tsv")
read_tsv = list(csv.reader(tsv_file, delimiter="\t"))

for row in read_tsv:
    if float(row[1]) < 5:
        read_tsv.remove(row)

That reads the file into a list of lists, each row being one list.这会将文件读入列表列表,每一行都是一个列表。 If the value its always gonna be the second one, you can read it like that and eliminate the row from the bigger list.如果该值始终是第二个,则可以这样读取它并从更大的列表中删除该行。 Hope it helped!希望它有所帮助!

EDIT: I'm sorry, I just saw your comment.编辑:对不起,我刚刚看到你的评论。 Try something else like what I edited up in this post (I did a couple of assumptions about CSV Reader that were wrong, but I think that it is fixed now. Hope it works.尝试其他类似我在这篇文章中编辑的内容(我对 CSV 阅读器做了一些错误的假设,但我认为它现在已修复。希望它有效。

If you're looking at this kind of files I would really look into Pandas.如果您正在查看此类文件,我真的会查看 Pandas。 That's basically Excel on steroids.这基本上是类固醇上的 Excel 。

The code would look more or less like this:代码看起来或多或少像这样:

df = pd.read_csv('file.tsv', sep='\t')
df = df.loc[4 < df['column_name']]

First you should read in the file.首先你应该读入文件。 This will give you a list with each line from the file:这将为您提供文件中每一行的列表:

with open('test.txt', 'rt') as file:
    content = file.readlines()

Now its enough to check the integer made from the last to digits of each line (after stripping the newline character).现在它足以检查 integer 由每行的最后一个数字组成(在剥离换行符之后)。 Compare it to 5 and then write it back to the file, like this:将其与 5 进行比较,然后将其写回文件,如下所示:

with open('test.txt', 'wt') as file:
    file.writelines([c for c in content if int(c.strip()[-2:]) >= 5])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM