[英]select regular expression in Python
我有一個字符串,這是
"contributors_enabled": false, "geo_enabled": false, "created_at": "Fri Nov 11 15:38:06 +0000 2016"}, "text": "Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket", "timestamp_ms": "1509073455803",.
我將使用正則表達式選擇文本:
Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket
在“ text”之后:“”和“ timestamp_ms”之前:
是否可以收集這些文本?
可能? 是。
def text_scrap(text, start, end):
"""This function returns the data between start and end."""
_,_,rest = text.partition(start)
result,_,_ = rest.partition(end)
return result
my_text = "contributors_enabled": false, "geo_enabled": false, "created_at": "Fri Nov 11 15:38:06 +0000 2016"}, "text": "Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket", "timestamp_ms": "1509073455803",.
data_scrapped = text_scrap(my_text, start=' "text": "', end="timestamp_ms") # use our new shiny function
print(data_scrapped)
好主意? 可能不是。
您的代碼是字典,因此您可以更輕松地訪問字典的“文本”鍵。 請選中此以了解字典。
盡管從字符串看來,您的整個字符串似乎都可以被解析,因為它似乎是JSON。 但是,由於您正在尋找與正則表達式相關的解決方案,所以希望以下對您有用。
import re
pattern = '"text": "(.*), "timestamp_ms"'
str = """
"contributors_enabled": false, "geo_enabled": false, "created_at": "Fri Nov 11 15:38:06 +0000 2016"}, "text": "Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket", "timestamp_ms": "1509073455803",.
"""
print re.findall(pattern, string=str)[0]
輸出:
Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket"
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.