[英]Replace dynamic hyperlinks with null in a column using python pandas
One of the columns in my data frame contains some text with hyper links and I want to replace all the hyperlinks with null. 数据框中的一列包含一些带有超级链接的文本,我想将所有超链接替换为null。
df_new["column_name"] = df_new["column_name"].replace(to_replace =r'https://example.com/xyz/pqr/*.html$', value = '', regex = True)
Eg: The hyper links will be of the following format: 例如:超级链接的格式如下:
https://example.com/xyz/pqr/xxxxx.html
https://example.com/xyz/pqr/yyyyy.html
https://example.com/xyz/pqr/zzzzz.html
Use .+
for select all values with one or more repetitions (+), with \\.
使用
.+
使用.+
来选择一个或多个重复(+)的所有值\\.
for escape .
为了逃脱
.
, because special regex character (any character): ,因为特殊的正则表达式字符(任何字符):
df_new["column_name"]=df_new["column_name"].replace(r'https://example\.com/xyz/pqr/.+\.html$',
value = '', regex = True)
this should do 这应该做
import re
df_new["column_name"] = df_new.column_name.apply(lambda x: re.sub(r"https:.+html", value = '', x)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.