简体   繁体   中英

Filter list by length of words

I am trying to filter a list where there are words line by line, by the length of the word(between 4 and 8 characters). So that if the input file has:

  • hello
  • communication
  • be
  • dog
  • test

The output file is:

  • hello
  • test

So I have this code:

dir = "lower.lst"
dict = open(dir, 'r').readlines()
f=open('dictionary','w')
for word in dict:
  if len(word)>=4 & len(word)<=8:
    f.write(word)
f.close()
print(len(dict))

print(f)

But the output file keeps all the words. By the way is there any more efficient way to do this?

  • Use the with-statement to automatically close files (even with exceptions are encountered).
  • & in Python is really for bit twiddling only, use and .
  • You don't actually need and , because comparisons can be chained. ( len(word)>=4 and len(word)<=8 is equivalent to 4 <= len(word) <= 8 ).
  • In your question you use .readlines() and here I use for line in fin: . Either way the resulting strings will end in newline characters, so your length measurements will be off by one. I correct for this by stripping the line before taking the length ( len(line.strip()) ). (Your code as written should have omitted 'be' , but kept 'dog' , because it's really 'dog\\n' which has a length of 4).
  • You said your code kept all of the words. To my eye your code should have worked to omit 'communication\\n' and 'be\\n' . I could imagine that 'be\\n' might be kept if there were extra spaces after it in the file ( 'be \\n ' has a length of 5 because of the two spaces). But there seems to be no logical way that 'communication\\n' would be kept in your output file. You may want to double check that it really was there.

with open('lower.lst', 'r') as fin, open('dictionary', 'w') as fout:
    for line in fin:
        if 4 <= len(line.strip()) <= 8:
            fout.write(line)

There are more than one choice to do this.

  1. With filter() built-in function

Check the docs here .

Let's suppose you have list of strings called data , then:

data = ['hello', 'communication', 'be', 'dog', 'test']
filtered_list = filter(lambda x: len(x) > 4 and len(x) < 8, data)
print(filtered_list)

Will return:

Python 3.6.1 (default, Dec 2015, 13:05:11)
[GCC 4.8.2] on linux
>   
['hello']

You can change the lambda function to filter different conditions. Filter will "catch" every element that returns True .

  1. With list-comprehension

This is probably the shortest way to achieve this. Just need to do:

filtered_list = [x for x in data if len(x) > 4 and len(x) < 8]

List comprehension does let you choose which elements you want to construct your list from. Here's an example implementation:

s = """
hello
communication
be
dog
test
"""

lst = [elm for elm in s.split() if (len(elm) >= 4 and len(elm) <= 8)]

print(lst)

Output:

['hello', 'test']

Is this what you're looking for? Here I use file context managers with the with reserved word, and I use and instead of & as noted in the comments.

with open("lower.lst", "r") as f:
   o = [word for word in f if (len(word) >= 4 and len(word) <= 8)]

with open("outfile.lst", "w") as f:
   f.write(o)

It's a bit tough to know if this will format exactly to your intentions in the outfile.

Your code should work if you replace & for and , ie:


dict = open("lower.lst", 'r').readlines()
with open('dictionary','w') as f:
    for word in dict:
        if len(word)>=4 and len(word)<=8:
            f.write(word)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM