我将如何隔离包含特定字符的特定单词？

Question

So I'm creating an analytics bot for my EPQ that counts the number of time a specific hashtag is used.因此，我正在为我的 EPQ 创建一个分析机器人，用于计算特定主题标签的使用次数。 How would I go about checking if a word in a string of other words contains a # ?我将如何检查其他单词字符串中的单词是否包含# ？

Answer 1

A first approach can check if a string has a substring using in , and gather a count for each unique word using a dictionary:第一种方法可以使用in检查字符串是否具有子字符串，并使用字典收集每个唯一单词的计数：

texts = ["it's friday! #TGIF", "My favorite day! #TGIF"]
counts = {}

for text in texts:
    for word in text.split(" "):
            if "#" not in word:
                    continue
            if word not in counts:
                    counts[word] = 0
            counts[word] += 1

print(counts)
# {'#TGIF': 2}

This could be improved further with:这可以通过以下方式进一步改进：

using str.casefold() to normalize text with different casings使用str.casefold()规范化不同大小写的文本
using regex to ignore certain chars, eg '#tgif!'使用正则表达式忽略某些字符，例如 '#tgif!' should be parsed as '#tgif'应该被解析为“#tgif”

Answer 2

You already have a decent answer, so it really just comes down to what kind of data you want to end up with.你已经有了一个不错的答案，所以它真的归结为你想要最终得到什么样的数据。 Here's another solution, using Python's re module on the same data:这是另一个解决方案，在相同数据上使用 Python 的re模块：

import re

texts = ["it's friday! #TGIF #foo", "My favorite day! #TGIF"]

[re.findall('#(\w+)', text) for text in texts]

Regex takes some getting used to.正则表达式需要一些时间来适应。 The '#(\w+)' 'captures' (with the parentheses) the 'word' ( \w+ ) after any hash characters ( '#' ). '#(\w+)' '捕获'（带括号）任何哈希字符（ '#' ）之后的'word'（ \w+ ）。 It results in a list of hashtags for each 'document' in the dataset:它会为数据集中的每个“文档”生成一个主题标签列表：

[['TGIF', 'foo'], ['TGIF']]

Then you could get the total counts with this trick :然后你可以用这个技巧得到总数：

from collections import Counter
from itertools import chain

Counter(chain.from_iterable(finds))

Yielding this dictionary-like thing:产生这个类似字典的东西：

Counter({'TGIF': 2, 'foo': 1})

Answer 3

test = " if a word in a string of other words contains a #"
if "#" in test:
    print("yes")

我将如何隔离包含特定字符的特定单词？

问题描述

3 个解决方案

解决方案1
1 2022-06-21 23:28:01

解决方案2
1 2022-06-21 23:48:26

解决方案3
0 2022-06-21 23:13:31

我将如何隔离包含特定字符的特定单词？

问题描述

3 个解决方案

解决方案1 1 2022-06-21 23:28:01

解决方案2 1 2022-06-21 23:48:26

解决方案3 0 2022-06-21 23:13:31

解决方案1
1 2022-06-21 23:28:01

解决方案2
1 2022-06-21 23:48:26

解决方案3
0 2022-06-21 23:13:31