简体   繁体   中英

How to extract words with a repeated letter not count one specific? python

I have a multiple word file where the same letters appear multiple times. I've already learned to catch these words.

Now I would like the words with the letter "a" not to be counted by the script.

My file.txt:

abac
test
testtest
dog
cat
one
doog
helo
hello
abaa
abba

my code:

li = []
for string in open("test.txt", 'r', encoding='utf-8'):
    count = 0
    for qq in range(0, len(string)):
        if count == 1:
            break
        for zz in range(qq + 1, len(string)):
            if string[qq] == string[zz]:
                count = 1
            if count == 1:
                li.append(string.replace("\n", ""))
                break
print(li)

result:

['test', 'testtest', 'doog', 'hello', 'abaa', 'abba']

I try to make that only "a" can repeat in a word, if "a" is repeated and another letter, this word is to be extracted

Expects to not recognize the word "abaa" as a result Because, in this word, only "a" is repeated. No other letter repeated.

If the "a" and another letter are repeated then the word is to be extracted in this case

If you don't want to catch repeated a then if it out!

if string[qq] == string[zz] and string[qq] and string[qq] != "a":
    count = 1

print(li)

But if you don't mind, your program could be improved.

Firstly, and string[qq] has no effect - for normal letters it always evaluates to True

Secondly, your count (unless you plan to extend the program to allow different number of counts) could be a boolean,

letter_repeated = False
if (...):
    letter_repeated = True

And as a bonus, you have a Counter in python which generally do what you want:

li = []
max_count = 1
for string in open("text.txt", "r", encoding="utf-8"):
    c = Counter(string) # you can modify that counter by e.g removing "a"
    if c.most_common(1)[0][1] > max_count:
        li.append(string.replace("\n", ""))

print(li)

Simply just skip over any instance of "a" in the first loop, as we don't care if the "a" s repeat.

li = []
for string in open("test.txt", 'r', encoding='utf-8'):
    count = 0
    for qq in range(0, len(string)):
        if count == 1:
            break
        if string[qq] != "a":
          for zz in range(qq + 1, len(string)):
              if string[qq] == string[zz]:
                  count = 1
              if count == 1:
                  li.append(string.replace("\n", ""))
                  break
print(li)

Note: I saw that edit just now, your output is different it now includes "abac" as it should. However the code above obviously doesn't.

Python has a Counter object that would be useful here

from collections import Counter
words = '''
abac
test
testtest
dog
cat
one
doog
helo
hello
abaa
abba
'''
li = []
for word in words.split():
    letter_count = Counter(word.strip())
    if letter_count['a'] == 1:
        li.append(word.strip())
    else:
        del letter_count['a'] # remove a from counter
        if sum(letter_count.values()) > len(set(word.strip())) - 1: #This means something is repeating
            li.append(word.strip())

Output

['test', 'testtest', 'dog', 'cat', 'one', 'doog', 'helo', 'hello', 'abba']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM