[英]Pandas regex: extract url information from column
import pandas as pd
d = {"Device_Type" : ["AXO145","TRU151","ZOD231","YRT326","LWR245"],
"Stat_Access_Link" : ["<url>https://xcd32112.smart_meter.com</url>",
"<url>http://tXh67.dia_meter.com</url>",
"<url>https://yT5495.smart_meter.com</url>",
"<url>https://ret323_TRu.crown.com</url>",
"<url>https://luwr3243.celcius.com</url>"]}
df = pd.DataFrame(data = d)
我有一個這樣的 dataframe,我需要做的是使用正則表達式從標簽中提取 url 信息。 output 必須是這樣的:
設備類型 | 統計訪問鏈接 |
---|---|
AXO145 | xcd32112.smart_meter.com |
TRU151 | tXh67.dia_meter.com |
ZOD231 | yT5495.smart_meter.com |
YRT326 | ret323_TRu.crown.com |
LWR245 | luwr3243.celcius.com |
任何幫助表示贊賞。
你真的需要正則表達式嗎?
如果您總是有<url>...</url>
,請使用:
df['Stat_Access_Link'].str[5:-6]
否則,您可以使用:
df['Stat_Access_Link'].str.extract(r'<url>(.*)</url>', expand=False)
# OR
df['Stat_Access_Link'].str.extract(r'<url>([^<>]*)</url>', expand=False)
output:
0 https://xcd32112.smart_meter.com
1 http://tXh67.dia_meter.com
2 https://yT5495.smart_meter.com
3 https://ret323_TRu.crown.com
4 https://luwr3243.celcius.com
Name: Stat_Access_Link, dtype: object
str.extract
是你需要的:
d = {"Device_Type" : ["AXO145","TRU151","ZOD231","YRT326","LWR245"],
"Stat_Access_Link" : ["<url>https://xcd32112.smart_meter.com</url>",
"<url>http://tXh67.dia_meter.com</url>",
"<url>https://yT5495.smart_meter.com</url>",
"<url>https://ret323_TRu.crown.com</url>",
"<url>https://luwr3243.celcius.com</url>"]}
df = pd.DataFrame(d)
pattern = re.compile(r"(?<=://)(.*)(?=</url)")
df['Stat_Access_Link'] = df['Stat_Access_Link'].str.extract(pattern, expand=False)
print(df)
Output:
Device_Type Stat_Access_Link
0 AXO145 xcd32112.smart_meter.com
1 TRU151 tXh67.dia_meter.com
2 ZOD231 yT5495.smart_meter.com
3 YRT326 ret323_TRu.crown.com
4 LWR245 luwr3243.celcius.com
在我看來,您應該考慮使用 pandas 解決方案pd.DataFrame.str.extract
,因為它內置於 pandas 中。
reg=r'\/\/([\s\S]*)<'
df['matched'] = df['Stat_Access_Link'].str.extract(reg)
print(df)
結果如下:
設備類型 | 統計訪問鏈接 | 匹配的 | |
---|---|---|---|
0 | AXO145 | https://xcd32112.smart_meter.com | xcd32112.smart_meter.com |
1個 | TRU151 | http://tXh67.dia_meter.com | tXh67.dia_meter.com |
2個 | ZOD231 | https://yT5495.smart_meter.com | yT5495.smart_meter.com |
3個 | YRT326 | https://ret323_TRu.crown.com | ret323_TRu.crown.com |
4個 | LWR245 | https://luwr3243.celcius.com | luwr3243.celcius.com |
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.