简体   繁体   English

使用python pandas在列中将动态超链接替换为null

[英]Replace dynamic hyperlinks with null in a column using python pandas

One of the columns in my data frame contains some text with hyper links and I want to replace all the hyperlinks with null. 数据框中的一列包含一些带有超级链接的文本,我想将所有超链接替换为null。

df_new["column_name"] = df_new["column_name"].replace(to_replace =r'https://example.com/xyz/pqr/*.html$', value = '', regex = True)

Eg: The hyper links will be of the following format: 例如:超级链接的格式如下:

https://example.com/xyz/pqr/xxxxx.html 
https://example.com/xyz/pqr/yyyyy.html
https://example.com/xyz/pqr/zzzzz.html

Use .+ for select all values with one or more repetitions (+), with \\. 使用.+使用.+来选择一个或多个重复(+)的所有值\\. for escape . 为了逃脱. , because special regex character (any character): ,因为特殊的正则表达式字符(任何字符):

df_new["column_name"]=df_new["column_name"].replace(r'https://example\.com/xyz/pqr/.+\.html$',
                                                      value = '', regex = True)

this should do 这应该做

import re
df_new["column_name"] = df_new.column_name.apply(lambda x: re.sub(r"https:.+html",  value = '', x)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM