正则表达式，用于检查字符串在python中是否具有至少一个且最多3个单词和一个以上的＃标签

Question

s1 = 'Makeupby Antonia #makeup #makeupartist #makeupdolls #abhcosmetics'
s2 = 'Makeupby Antonia asia #makeup #makeupartist #makeupdolls'
s3 = 'Makeupby Antonia'
s4 = '#makeup #makeupartist #makeupdolls #abhcosmetics'  
s5 = 'Makeupby Antonia asia america #makeup #makeupartist'

Regex should be able to match s1 and s2 only because normal words count is up to 3 and these have more then one hashtag. 正则表达式应该只能匹配s1和s2因为普通单词数最多为3，并且这些单词具有一个以上的＃标签。

I am able to select normal words using \\b(?<![#])[\\w]+ 我可以使用\\b(?<![#])[\\w]+选择普通词
and 和
I am able to select hashtag using [#]{1}\\w+ 我可以使用[#]{1}\\w+选择主题标签
but when I combine the expression then it does work. 但是当我组合表达式时，它确实起作用。

How can I make final regex using these individual regex which can also track count? 我如何使用也可以跟踪计数的单个正则表达式制作最终的正则表达式？

Answer 1

The sane solution 理智的解决方案

Split the text into words and count how many of them start with a hash sign. 将文本拆分为单词，并计算其中有多少以井号开头。

def check(text):
    words = text.split()

    num_hashtags = sum(word.startswith('#') for word in words)
    num_words = len(words) - num_hashtags

    return 1 <= num_words <= 3 and num_hashtags > 1

>>> [check(text) for text in [s1,s2,s3,s4]]
[True, True, False, False]

The regex solution 正则表达式解决方案

import re

def check(text):
    pattern = r'(?=.*\b(?<!#)\w+\b)(?!(?:.*\b(?<!#)\w+\b){4})(?:.*#){2}'
    return bool(re.match(pattern, text))

I'm purposely not going to explain that regex because I don't want you to use it. 我故意不解释该正则表达式，因为我不希望您使用它。 That feeling of confusion you're probably feeling should be a strong sign that this is bad code. 您可能会感到困惑，这很可能表明这是错误的代码。

Answer 2

If I correctly understood your question and if you can assume words are always before tags you can use r'^(\\w+ ){1,3}#\\w+ #\\w+' : 如果我正确理解了您的问题，并且可以假设单词始终位于标记之前，则可以使用r'^(\\w+ ){1,3}#\\w+ #\\w+' ：

for s in ('Makeupby Antonia #makeup #makeupartist #makeupdolls #abhcosmetics',
          'Makeupby Antonia asia #makeup #makeupartist #makeupdolls',
          'Makeupby Antonia',
          '#makeup #makeupartist #makeupdolls #abhcosmetics',  
          'Makeupby Antonia asia america #makeup #makeupartist',):
    print(bool(re.search(r'^(\w+ ){1,3}#\w+ #\w+', s)), s, sep=': ')

This outputs: 输出：

True: Makeupby Antonia #makeup #makeupartist #makeupdolls #abhcosmetics
True: Makeupby Antonia asia #makeup #makeupartist #makeupdolls
False: Makeupby Antonia
False: #makeup #makeupartist #makeupdolls #abhcosmetics
False: Makeupby Antonia asia america #makeup #makeupartist

Answer 3

Probably a lot of room for optimization (maybe with dependencies/fewer loops) but here's a non-regex solution as discussed in comments: 可能有很大的优化空间（可能有依赖项/较少的循环），但这是非正则表达式的解决方案，如注释中所述：

s_list = [s1, s2, s3, s4]

def hashtag_words(string_list):
    words = [s.split(" ") for s in string_list]
    hashcounts = [["#" in word for word in wordlist].count(True) for wordlist in words]
    normcounts = [len(wordlist) - hashcount for wordlist, hashcount in zip(words, hashcounts)]
    sel_strings = [s for s, h, n in zip(string_list, hashcounts, normcounts) if h>1 if n in (1,2,3)]
    return sel_strings

hashtag_words(s_list)

>['Makeupby Antonia #makeup #makeupartist #makeupdolls #abhcosmetics',
 'Makeupby Antonia asia #makeup #makeupartist #makeupdolls']

正则表达式，用于检查字符串在python中是否具有至少一个且最多3个单词和一个以上的＃标签

问题描述

3 个解决方案

解决方案1
4 2018-07-02 21:59:54

The sane solution 理智的解决方案

The regex solution 正则表达式解决方案

解决方案2
1 2018-07-02 22:04:54

解决方案3
0 2018-07-02 21:59:24

正则表达式，用于检查字符串在python中是否具有至少一个且最多3个单词和一个以上的＃标签

问题描述

3 个解决方案

解决方案1 4 2018-07-02 21:59:54

The sane solution 理智的解决方案

The regex solution 正则表达式解决方案

解决方案2 1 2018-07-02 22:04:54

解决方案3 0 2018-07-02 21:59:24

解决方案1
4 2018-07-02 21:59:54

解决方案2
1 2018-07-02 22:04:54

解决方案3
0 2018-07-02 21:59:24