[英]Replace column value if they starts with/match a string for pandas dataframe
I have a column in my dataframe prices_df
as thumbnail_url
. 我在数据
prices_df
有一列作为thumbnail_url
。
zipcode thumbnail_url
0 11201 https://a0.muscache.com/im/pictures/6d7cbbf7-c...
1 10019 0
2 10027 https://a0.muscache.com/im/pictures/6fae5362-9...
3 94117 https://a0.muscache.com/im/pictures/72208dad-9...
4 20009 0
5 94131 https://a0.muscache.com/im/pictures/82509143-4...
I need to replace all values where the row contains https://
or lets say contains .com
with numeric value 1. 我需要用数字值1 替换 行包含
https://
所有值 ,或者说包含.com
的行 。
zipcode thumbnail_url
0 11201 1
1 10019 0
2 10027 1
Tried this 试过这个
img_Uploaded = prices_df['thumbnail_url'].str.contains("http") == True
prices_df.replace(to_replace=prices_df[img_Uploaded],value=1,inplace=True)
My dataframe is of shape (74111, 2)
我的数据
(74111, 2)
的形状(74111, 2)
This line of code takes too much time and my system froze. 这行代码花费了太多时间,并且我的系统冻结了。 Can someone suggest a better vectorized operation and explain it.
有人可以提出更好的矢量化操作并进行解释吗?
My issue is resolved but I am curious what was wrong with my code ? 我的问题已解决, 但我很好奇我的代码出了什么问题? Apart from the fact that it did not optimized using vectorized operations?
除了它没有使用向量化操作进行优化之外,还包括以下事实: It should still run right?
它应该仍然运行正确吗? Or THAT is the reason why it froze and did not run whereas the codes suggested below ran in seconds
或这就是它冻结而无法运行而下面建议的代码在几秒钟内运行的原因
您可以使用apply()函数来完成此操作:
prices_df.thumbnail_url = prices_df.thumbnail_url.apply(lambda url: 1 if 'http' in str(url) else url)
You can use lambda expression 您可以使用lambda表达式
prices[['thumbnail_url']] = prices[['thumbnail_url']].apply(lambda x: 1 if 'https://' in str(x) else 0)
They are a shorthand to create anonymous functions;
它们是创建匿名函数的简写。 the expression lambda parameters: expression yields a function object.
表达式lambda参数:表达式产生一个函数对象。 The unnamed object behaves like a function object defined with
未命名对象的行为类似于使用定义的函数对象
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.