正则表达式从推文中提取@name符号

Question

I would like to use regular expression to extract only @patrick @michelle from the following sentence: 我想使用正则表达式从以下句子中仅提取@patrick @michelle ：

@patrick  @michelle we having diner @home tonight do you want to join?

Note: @home should not be include in the result because, it is not at beginning of the sentence nor is followed by another @name . 注意： @home不应包含在结果中，因为它不是在句子的开头，也不在后面是另一个@name 。

Any solution, tip, comments will be really appreciated. 任何解决方案，提示，评论将不胜感激。

Answer 1

/(?:(?:@\S+\s+)+|^)@\S+/g

It first matches either an "@" followed by many non-space characters, or the start of line, and then matches another "@" followed by many non-space characters. 它首先匹配后跟许多空格字符的“ @”或行首，然后匹配后跟许多空格字符的另一个“ @”。

Note that it's common in Twitter that @name is preceded by RT , appears in the middle or end of the tweet eg http://twitter.com/ceetee/statuses/9874073403 . 请注意，在Twitter中@name前面@name RT是很常见的，它出现在tweet的中间或结尾，例如http://twitter.com/ceetee/statuses/9874073403 。 Basically you can't distinguish whether a @name is really a name just using RegEx or even a parser. 基本上，仅使用RegEx甚至解析器就无法区分@name是不是真的名称。 The best bet is to check if http://twitter.com/name 404 or not. 最好的选择是检查是否http://twitter.com/name 404。

Answer 2

Well, at first I thought this failed because I looked at the groups that are returned: 好吧，起初我以为这失败了，因为我查看了返回的组：

>>> tw = re.compile(r"^((@\w*)\s+)*")
>>> tw.findall(tweet)
[('@michelle ', '@michelle')]
>>> tw.match(tweet).groups()
('@michelle ', '@michelle')

Note that the groups only keep the last value for any group in the re. 请注意，组仅保留re中任何组的最后一个值。 But if you just grab group(), then you get the whole matched string: 但是，如果您只是抓住group（），那么您将获得整个匹配的字符串：

>>> tw.match(tweet).group()
'@patrick  @michelle '

For grins, I'll try pyparsing: 对于笑容，我将尝试pyparsing：

>>> from pyparsing import Word, printables, OneOrMore
>>> atName = Word("@",printables)
>>> OneOrMore(atName).parseString(tweet).asList()
['@patrick', '@michelle']

Answer 3

Try this regular expression: 试试这个正则表达式：

/^\s*@(\w+)\s+@(\w+)/

\\s denotes whitespace characters and \\w word characters. \\s表示空格字符，而\\w表示单词字符。

Answer 4

As long as it starts with an @ and continues with those this will do it I tested it in poweshell so some regex engines are a bit different. 只要它以@开头并继续执行这些操作，我都会在poweshell中对其进行测试，因此某些正则表达式引擎会有所不同。 This should also catch n names at the beginning of the line 这也应该在行首捕获n个名称

"^((@\\w+)\\s)+" “^（（@ \\ W +）\\ S）+”

Answer 5

也许像这样，尽管您必须将匹配组中的任何内容都在空白处拆分以提取多个ID。

/^\s*(@\w+\s+)*\s+.*$/

Answer 6

You have tagged your post c#, so I assume you can use the .NET Regex imnplementation. 您已经标记了您的帖子c＃，所以我假设您可以使用.NET Regex实现。 Using .NET, the following Regex will do: 使用.NET，以下正则表达式将起作用：

(?<![^@]\w+\s+)(@\w+)

This will match any words starting with @, that do not have a word without @ before them. 这将匹配以@开头的所有单词，在它们之前没有@的单词。 Note that "dinner @home @8pm" will still break it, though. 注意，“ dinner @home @ 8pm”仍然会破坏它。

See here for more details. 有关更多详细信息，请参见此处。

Answer 7

for PHP 对于PHP

/^\s*@(\w+)\s+@(\w+)/

Thanks KennyM 谢谢肯尼

in python 在python中

msg = '@patrick  @michelle we having diner @home tonight do you want to join?'
import re
re.findall('(?:(?:@\S+\s+)+|^)@\S+', msg)

This works with 1 or n @name at the beginning of the sentence. 这适用于句子开头的1或n @name。

Thank you all for the quick replies. 谢谢大家的快速回复。

Answer 8

In Perl, you can exploit the /g match-more-than-once modifier combined with the \\G zero-width where-we-left-off assertion and list context, thus: 在Perl中，您可以利用/g比一次匹配的修饰符结合\\G零宽度where-we-left-off断言和列表上下文，因此：

my $str = '@patrick  @michelle we having diner @home tonight do you want to join?';
my @matches = ($str =~ m/\G(\@\w+)\s*/g);

print join(', ', @matches) . "\n";

This should be robust across any number of initial @-strings. 这对于任何数量的初始@字符串都应该是可靠的。

Answer 9

For Python check out: http://github.com/BonsaiDen/AtarashiiFormat 对于Python，请查看： http : //github.com/BonsaiDen/AtarashiiFormat
It will also give you the links and the tags. 它还将为您提供链接和标签。

And beware of using a simple regex, you will end up with a big mess, as I did before I converted the Twitter Text Java Library. 并要避免使用简单的正则表达式，结果就像在转换Twitter Text Java库之前所做的那样，将导致一团糟。

Answer 10

For C# I would do as follows: 对于C＃，我将执行以下操作：

@([A-Za-z0-9-_&;]+) @（[A-ZA-Z0-9 -_＆;] +）

正则表达式从推文中提取@name符号

问题描述

10 个解决方案

解决方案1
4 已采纳 2010-03-02 13:22:06

解决方案2
1 2010-03-02 13:31:55

解决方案3
0 2010-03-02 13:19:28

解决方案4
0 2010-03-02 13:20:59

解决方案5
0 2010-03-02 13:21:43

解决方案6
0 2010-03-02 13:31:51

解决方案7
0 2010-03-02 13:46:32

解决方案8
0 2010-03-04 17:43:36

解决方案9
0 2010-03-27 09:47:14

解决方案10
0 2012-03-09 18:08:43

正则表达式从推文中提取@name符号

问题描述

10 个解决方案

解决方案1 4 已采纳 2010-03-02 13:22:06

解决方案2 1 2010-03-02 13:31:55

解决方案3 0 2010-03-02 13:19:28

解决方案4 0 2010-03-02 13:20:59

解决方案5 0 2010-03-02 13:21:43

解决方案6 0 2010-03-02 13:31:51

解决方案7 0 2010-03-02 13:46:32

解决方案8 0 2010-03-04 17:43:36

解决方案9 0 2010-03-27 09:47:14

解决方案10 0 2012-03-09 18:08:43

解决方案1
4 已采纳 2010-03-02 13:22:06

解决方案2
1 2010-03-02 13:31:55

解决方案3
0 2010-03-02 13:19:28

解决方案4
0 2010-03-02 13:20:59

解决方案5
0 2010-03-02 13:21:43

解决方案6
0 2010-03-02 13:31:51

解决方案7
0 2010-03-02 13:46:32

解决方案8
0 2010-03-04 17:43:36

解决方案9
0 2010-03-27 09:47:14

解决方案10
0 2012-03-09 18:08:43