[英]Python pandas check if the last element of a list in a cell contains specific string
my dataframe df:
index url
1 [{'url': 'http://bhandarkarscollegekdp.org/'}]
2 [{'url': 'http://cateringinyourhome.com/'}]
3 NaN
4 [{'url': 'http://muddyjunction.com/'}]
5 [{'url': 'http://ecskouhou.jp/'}]
6 [{'url': 'http://andersrice.com/'}]
7 [{'url': 'http://durager.cz/'}, {'url': 'http:andersrice.com'}]
8 [{'url': 'http://milenijum-osiguranje.rs/'}]
9 [{'url': 'http://form-kind.org/'}, {'url': 'https://osiguranje'},{'url': 'http://beseka.com.tr'}]
I would like to select the rows if the last item in the list of the row of url column contains 'https', while skipping missing values.如果 url 列的行列表中的最后一项包含“https”,我想选择行,同时跳过缺失值。
My current script我现在的剧本
df[df['url'].str[-1].str.contains('https',na=False)]
returns False values for all the rows while some of them actually contains https.为所有行返回 False 值,而其中一些实际上包含 https。
Can anybody help with it?有人可以帮忙吗?
I think you can first replace NaN
to empty url
and then use apply
:我认为您可以先将NaN
替换为empty url
,然后使用apply
:
df = pd.DataFrame({'url':[[{'url': 'http://bhandarkarscollegekdp.org/'}],
np.nan,
[{'url': 'http://cateringinyourhome.com/'}],
[{'url': 'http://durager.cz/'}, {'url': 'https:andersrice.com'}]]},
index=[1,2,3,4])
print (df)
url
1 [{'url': 'http://bhandarkarscollegekdp.org/'}]
2 NaN
3 [{'url': 'http://cateringinyourhome.com/'}]
4 [{'url': 'http://durager.cz/'}, {'url': 'https...
df.loc[df.url.isnull(), 'url'] = [[{'url':''}]]
print (df)
url
1 [{'url': 'http://bhandarkarscollegekdp.org/'}]
2 [{'url': ''}]
3 [{'url': 'http://cateringinyourhome.com/'}]
4 [{'url': 'http://durager.cz/'}, {'url': 'https...
print (df.url.apply(lambda x: 'https' in x[-1]['url']))
1 False
2 False
3 False
4 True
Name: url, dtype: bool
First solution:第一个解决方案:
df.loc[df.url.notnull(), 'a'] =
df.loc[df.url.notnull(), 'url'].apply(lambda x: 'https' in x[-1]['url'])
df.a.fillna(False, inplace=True)
print (df)
url a
1 [{'url': 'http://bhandarkarscollegekdp.org/'}] False
2 NaN False
3 [{'url': 'http://cateringinyourhome.com/'}] False
4 [{'url': 'http://durager.cz/'}, {'url': 'https... True
not sure url is str or other types不确定 url 是 str 还是其他类型
you can do like this:你可以这样做:
"https" in str(df.url[len(df)-1])
or或者
str(df.ix[len(df)-1].url).__contains__("https")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.