简体   繁体   English

如何提取带有重复字母的单词而不是一个特定的单词? python

[英]How to extract words with a repeated letter not count one specific? python

I have a multiple word file where the same letters appear multiple times.我有一个多字文件,其中相同的字母出现多次。 I've already learned to catch these words.我已经学会了抓住这些词。

Now I would like the words with the letter "a" not to be counted by the script.现在我想让带有字母“a”的单词不被脚本计算在内。

My file.txt:我的文件.txt:

abac
test
testtest
dog
cat
one
doog
helo
hello
abaa
abba

my code:我的代码:

li = []
for string in open("test.txt", 'r', encoding='utf-8'):
    count = 0
    for qq in range(0, len(string)):
        if count == 1:
            break
        for zz in range(qq + 1, len(string)):
            if string[qq] == string[zz]:
                count = 1
            if count == 1:
                li.append(string.replace("\n", ""))
                break
print(li)

result:结果:

['test', 'testtest', 'doog', 'hello', 'abaa', 'abba']

I try to make that only "a" can repeat in a word, if "a" is repeated and another letter, this word is to be extracted我试图让一个单词中只有“a”可以重复,如果“a”重复并且另一个字母,这个词将被提取

Expects to not recognize the word "abaa" as a result Because, in this word, only "a" is repeated.预计结果不会识别“abaa”这个词因为在这个词中,只有“a”被重复。 No other letter repeated.没有其他字母重复。

If the "a" and another letter are repeated then the word is to be extracted in this case如果“a”和另一个字母重复,则在这种情况下将提取单词

If you don't want to catch repeated a then if it out!如果你不想抓到重复的a then if it out!

if string[qq] == string[zz] and string[qq] and string[qq] != "a":
    count = 1

print(li)

But if you don't mind, your program could be improved.但如果你不介意,你的程序可以改进。

Firstly, and string[qq] has no effect - for normal letters it always evaluates to True首先, and string[qq]没有效果 - 对于普通字母,它总是评估为True

Secondly, your count (unless you plan to extend the program to allow different number of counts) could be a boolean,其次,您的count (除非您计划扩展程序以允许不同的计数)可能是 boolean,

letter_repeated = False
if (...):
    letter_repeated = True

And as a bonus, you have a Counter in python which generally do what you want:作为奖励,您在 python 中有一个计数器,通常可以执行您想要的操作:

li = []
max_count = 1
for string in open("text.txt", "r", encoding="utf-8"):
    c = Counter(string) # you can modify that counter by e.g removing "a"
    if c.most_common(1)[0][1] > max_count:
        li.append(string.replace("\n", ""))

print(li)

Simply just skip over any instance of "a" in the first loop, as we don't care if the "a" s repeat.只需跳过第一个循环中"a"的任何实例,因为我们不关心"a"是否重复。

li = []
for string in open("test.txt", 'r', encoding='utf-8'):
    count = 0
    for qq in range(0, len(string)):
        if count == 1:
            break
        if string[qq] != "a":
          for zz in range(qq + 1, len(string)):
              if string[qq] == string[zz]:
                  count = 1
              if count == 1:
                  li.append(string.replace("\n", ""))
                  break
print(li)

Note: I saw that edit just now, your output is different it now includes "abac" as it should.注意:我刚才看到了那个编辑,你的 output 是不同的,它现在应该包含“abac”。 However the code above obviously doesn't.但是上面的代码显然没有。

Python has a Counter object that would be useful here Python 有一个计数器object 在这里很有用

from collections import Counter
words = '''
abac
test
testtest
dog
cat
one
doog
helo
hello
abaa
abba
'''
li = []
for word in words.split():
    letter_count = Counter(word.strip())
    if letter_count['a'] == 1:
        li.append(word.strip())
    else:
        del letter_count['a'] # remove a from counter
        if sum(letter_count.values()) > len(set(word.strip())) - 1: #This means something is repeating
            li.append(word.strip())

Output Output

['test', 'testtest', 'dog', 'cat', 'one', 'doog', 'helo', 'hello', 'abba']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM