[英]Count the number of substrings between two markers in a list
I want to count the number of times the name is mentioned in the following list and put the number and frequency mentioned in a dictionary.我想计算以下列表中提及该名称的次数,并将提及的次数和频率放入字典中。
so, for example, this is the dialogue list所以,例如,这是对话列表
dialogue = ["This is great! RT @user14: Can you believe this?",
"That's right RT @user22: The dodgers are destined to win the west!",
"This is about things @user14, how could you",
"RT @user11: The season is looking great!"]
I want my output to be {user14:2, user22:1, user11:1}
我希望我的 output 是{user14:2, user22:1, user11:1}
I have tried to start writing the following to produce a name list and then count the name list and output to the dictionary.我尝试开始编写以下内容以生成名单,然后将名单和 output 计数到字典中。 But not sure how to do this但不知道如何做到这一点
user_name = [x.split('@')[1].split(':')[:-1] for x in tweets]
Regex is probably the best approach to account for the unknown characters after the user name:正则表达式可能是解释用户名后未知字符的最佳方法:
from collections import defaultdict
import re
result = defaultdict(int)
for item in dialogue:
user = re.search('(?<=@)[\w\s]+', item).group(0)
result[user] += 1
print(result)
Gives:给出:
{'user14': 2, 'user22': 1, 'user11': 1}
In single pass with collections.Counter
object and re.findall
function:在单程中使用collections.Counter
object 和re.findall
function:
from collections import Counter
import re
...
uname_counts = Counter(re.findall(r'@(\w+)', ''.join(dialogue)))
print(dict(uname_counts)) # {'user14': 2, 'user22': 1, 'user11': 1}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.