在tweepy tweet響應python中找到最后一個單詞

Question

我正在用python接收一條推文流，想提取最后一個單詞或知道在哪里引用它。

例如在

NC不喜歡一起工作www.linktowtweet.org

回來

 together

Answer 1

我不熟悉tweepy，因此我假設您將數據存儲在python字符串中，因此也許有更好的答案。

但是，給定python中的字符串，提取最后一個單詞很簡單。

解決方案1

使用str.rfind(' ') 。 這里的想法是在最后一個單詞之前找到空格。 這是一個例子。

text = "NC don’t like working together"
text = text.rstrip() # To any spaces at the end, that would otherwise confuse the algorithm.
last_word = text[text.rfind(' ')+1:] # Output every character *after* the space.
print(last_word)

注意：如果給出的字符串中沒有單詞， last_word將為空字符串。

現在假定所有單詞都用空格分隔。 要處理換行符和空格，請使用str.replace將它們轉換為字符串。 python中的空格為\\t\\n\\x0b\\x0c\\r ，但我認為在Twitter消息中只會找到換行符和制表符。

另請參見： string.whitespace

因此，一個完整的示例（包裝為一個函數）將是

def last_word(text):
    text = text.replace('\n', ' ') # Replace newlines with spaces.
    text = text.replace('\t', ' ') # Replace tabs with spaces.
    text = text.rstrip(' ') # Remove trailing spaces.
    return text[text.rfind(' ')+1:]

print(last_word("NC don’t like working together")) # Outputs "together".

對於基本解析，這可能仍然是最好的情況。 對於較大的問題，有更好的方法。

解決方案2

常用表達

這些是在python中處理字符串的方法，更加靈活。 經常被稱為REGEX的語言使用自己的語言來指定文本的一部分。

例如， .*\\s(\\S+)指定字符串中的最后一個單詞。

再有一個更長的解釋。

.*               # Match as many characters as possible.
\s               # Until a whitespace ("\t\n\x0b\x0c\r ")
(                # Remember the next section for the answer.
\S+              # Match a ~word~ (not whitespace) as possible.
)                # End saved section.

因此，在python中，您將按以下方式使用它。

import re # Import the REGEX library.

# Compile the code, (DOTALL makes . match \n).
LAST_WORD_PATTERN = re.compile(r".*\s(\S+)", re.DOTALL) 

def last_word(text):
    m = LAST_WORD_PATTERN.match(text)
    if not m: # If there was not a last word to this text.
        return ''
    return m.group(1) # Otherwise return the last word.

print(last_word("NC don’t like working together")) # Outputs "together".

現在，即使此方法不太明顯，它也有幾個優點。 首先，它更具可定制性。 如果您想匹配最后一個單詞而不是鏈接，則正則表達式r".*\\s([^.:\\s]+(?!\\.\\S|://))\\b"將匹配最后一個單詞，但是如果那是最后一件事，則忽略鏈接。

例：

import re # Import the REGEX library.

# Compile the code, (DOTALL makes . match \n).
LAST_WORD_PATTERN = re.compile(r".*\s([^.:\s]+(?!\.\S|://))\b", re.DOTALL)

def last_word(text):
    m = LAST_WORD_PATTERN.match(text)
    if not m: # If there was not a last word to this text.
        return ''
    return m.group(1) # Otherwise return the last word.

print(last_word("NC don’t like working together www.linktowtweet.org")) # Outputs "together".

此方法的第二個優點是速度。

您可以在線嘗試！ 在這里，正則表達式的方法幾乎與字符串操作一樣快，即使在某些情況下不是更快。 （實際上，我發現正則表達式在我的機器上比演示中的速度更快地執行.2 usec。）

無論哪種方式，即使在簡單的情況下，regex的執行速度都非常快，毫無疑問，regex比python中實現的任何更復雜的字符串算法都更快。 因此，使用正則表達式也可以加快代碼速度。

編輯更改了避免正則表達式的網址

re.compile(r".*\s([^.\s]+(?!\.\S))\b", re.DOTALL)

至

re.compile(r".*\s([^.:\s]+(?!\.\S|://))\b", re.DOTALL)

因此，調用last_word("NC don't like working together http://www.linktowtweet.org") together返回together而不是http:// 。

要了解此正則表達式如何工作，請查看https://regex101.com/r/sdwpqB/2 。

Answer 2

很簡單，所以如果您的文字是：

text = "NC don’t like working together www.linktowtweet.org"
text = re.sub(r'https?:\/\/.*[\r\n]*', '', text, flags=re.MULTILINE) #to remove any URL
text = text.split() #splits sentence into words with delimiter=" "
last_word = text[-1]

所以你去了！ 現在，您將得到最后的單詞“ together”。

在tweepy tweet響應python中找到最后一個單詞

問題描述

2 個解決方案

解決方案1
1 已采納 2018-07-07 21:53:25

解決方案2
0 2018-07-08 03:55:27

在tweepy tweet響應python中找到最后一個單詞

問題描述

2 個解決方案

解決方案1 1 已采納 2018-07-07 21:53:25

解決方案2 0 2018-07-08 03:55:27

解決方案1
1 已采納 2018-07-07 21:53:25

解決方案2
0 2018-07-08 03:55:27