Python pandas 检查单元格中列表的最后一个元素是否包含特定字符串

Question

my dataframe df:

index                        url
1           [{'url': 'http://bhandarkarscollegekdp.org/'}]
2             [{'url': 'http://cateringinyourhome.com/'}]
3                                                     NaN
4                  [{'url': 'http://muddyjunction.com/'}]
5                       [{'url': 'http://ecskouhou.jp/'}]
6                     [{'url': 'http://andersrice.com/'}]
7       [{'url': 'http://durager.cz/'}, {'url': 'http:andersrice.com'}]
8            [{'url': 'http://milenijum-osiguranje.rs/'}]
9       [{'url': 'http://form-kind.org/'}, {'url': 'https://osiguranje'},{'url': 'http://beseka.com.tr'}]

I would like to select the rows if the last item in the list of the row of url column contains 'https', while skipping missing values.如果 url 列的行列表中的最后一项包含“https”，我想选择行，同时跳过缺失值。

My current script我现在的剧本

df[df['url'].str[-1].str.contains('https',na=False)]

returns False values for all the rows while some of them actually contains https.为所有行返回 False 值，而其中一些实际上包含 https。

Can anybody help with it?有人可以帮忙吗？

Answer 1

I think you can first replace NaN to empty url and then use apply :我认为您可以先将NaN替换为empty url ，然后使用apply ：

df = pd.DataFrame({'url':[[{'url': 'http://bhandarkarscollegekdp.org/'}],
                          np.nan,
                         [{'url': 'http://cateringinyourhome.com/'}],  
                         [{'url': 'http://durager.cz/'}, {'url': 'https:andersrice.com'}]]},
                  index=[1,2,3,4])

print (df)
                                                 url
1     [{'url': 'http://bhandarkarscollegekdp.org/'}]
2                                                NaN
3        [{'url': 'http://cateringinyourhome.com/'}]
4  [{'url': 'http://durager.cz/'}, {'url': 'https...

df.loc[df.url.isnull(), 'url'] = [[{'url':''}]]
print (df)
                                                 url
1     [{'url': 'http://bhandarkarscollegekdp.org/'}]
2                                      [{'url': ''}]
3        [{'url': 'http://cateringinyourhome.com/'}]
4  [{'url': 'http://durager.cz/'}, {'url': 'https...

print (df.url.apply(lambda x: 'https' in x[-1]['url']))
1    False
2    False
3    False
4     True
Name: url, dtype: bool

First solution:第一个解决方案：

df.loc[df.url.notnull(), 'a'] = 
df.loc[df.url.notnull(), 'url'].apply(lambda x: 'https' in x[-1]['url'])

df.a.fillna(False, inplace=True)
print (df)
                                                 url      a
1     [{'url': 'http://bhandarkarscollegekdp.org/'}]  False
2                                                NaN  False
3        [{'url': 'http://cateringinyourhome.com/'}]  False
4  [{'url': 'http://durager.cz/'}, {'url': 'https...   True

Answer 2

not sure url is str or other types不确定 url 是 str 还是其他类型

you can do like this:你可以这样做：

"https" in str(df.url[len(df)-1])

or或者

str(df.ix[len(df)-1].url).__contains__("https")

Python pandas 检查单元格中列表的最后一个元素是否包含特定字符串

问题描述

2 个解决方案

解决方案1
1 已采纳 2016-10-03 12:29:14

解决方案2
0 2016-10-03 12:34:01

Python pandas 检查单元格中列表的最后一个元素是否包含特定字符串

问题描述

2 个解决方案

解决方案1 1 已采纳 2016-10-03 12:29:14

解决方案2 0 2016-10-03 12:34:01

解决方案1
1 已采纳 2016-10-03 12:29:14

解决方案2
0 2016-10-03 12:34:01