[英]How to use regex in python to find the pattern “numberxnumber” in a string
我有一列包含以下字符串:
** DSP_campaign_region_market_MO_0_Device_Display_Open Web_0_0_0_PROS_DSP 自定义 HH Ext_160x600_0_DYN_FLTKG_010121-123121_SP_PID=111112220202043
DSP_campaign_region_market_0_Device_video_Open Web_0_0_0_PROS_DSP Custom HH Ext_160x600_0__PID=11172045203353_DYN_FLTKG_010121-123121_MP **
我需要从上面显示的字符串中提取 160x600、1x1 等广告素材尺寸
我基本上用“_”拆分列中的所有字符串,并且 append 空列表将它们添加为列,
campaign=[]
dsp = []
market=[]
region =[]
device_type=[]
channel=[]
creative = []
for i in mapper['string_column']:
i = str(i)
i = i.split("_")
dsp.append(i[0].replace(" ",''))
campaign.append(i[1])
region.append(i[2])
market.append(i[3])
device_type.append(i[5])
channel.append(i[6])
**creative.append(i[13])**
然而,由于字符串命名之间缺乏对称性,一些(当被“_”分割时)将 i[13] 设置为160x600 ,而另一些则使用DSP Custom HH
那么,有没有办法使用正则表达式来识别字符串的创意大小部分,例如 160X600、1X1、720X90 等,而不是拆分字符串?
这可以使用正则表达式解决,而无需拆分初始字符串。 像这样的东西:
import re
texts = ["DSP_campaign_region_market_ MO_0_Device_Display_Open Web_0_0_0_PROS_DSP Custom HH Ext_160x600_0_DYN_FLTKG_010121-123121_SP_PID=111112220202043", "DSP_campaign_region_market_0_Device_video_Open Web_0_0_0_PROS_DSP Custom HH Ext_160x600_0__PID=11172045203353_DYN_FLTKG_010121-123121_MP"]
pattern = "\d+x\d+"
for text in texts:
occurences = re.findall(pattern, text)
for item in occurences:
print(item)
#> 160x600
#> 160x600
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.