[英]Pandas: replace values in dataframe
I have a dataframe df 我有一个数据框df
ID active_seconds domain subdomain search_engine search_term
0120bc30e78ba5582617a9f3d6dfd8ca 35 city-link.com msk.city-link.com None None
0120bc30e78ba5582617a9f3d6dfd8ca 54 vk.com vk.com None None
0120bc30e78ba5582617a9f3d6dfd8ca 34 mts.ru shop.mts.ru None None
16c28c057720ab9fbbb5ee53357eadb7 4 facebook.com facebook.com None None
and have a list url = ['city-link.com', 'shop.mts.ru']
. 并具有列表url = ['city-link.com', 'shop.mts.ru']
。 I need to change column with subdomain
. 我需要用subdomain
更改列。 If subdomain is equal one of elem from url
, leave it. 如果subdomain等于url
的elem之一,请将其保留。 If subdomain != elem from url
and domain == elem from url
I should rewrite subdomain(write domain to it). 如果subdomain != elem from url
和domain == elem from url
我应该重写subdomain(向其中写入域)。 And if subdomain
no in list no change. 并且,如果subdomain
没有在列表中没有变化。 How can I do it with pandas? 我该如何用熊猫呢? I try to do it with loop but it spent a lot of time 我尝试用循环来做,但是花了很多时间
domains = df['domain']
subdomains = df['subdomain']
urls = ['yandex.ru', 'vk.com', 'mail.ru']
for (domain, subdomain) in zip(domains, subdomains):
if subdomain in urls:
continue
elif domain in urls and subdomain not in urls:
df['subdomain'].replace(subdomain, domain, inplace=True)
First, you need to get records where domain field in urls list: 首先,您需要获取URL列表中domain字段的记录:
domains_in_urls = df[df.domain.isin(urls)]
Next, you have to take these records and find out records where subdomain field are not in urls: 接下来,您必须获取这些记录并找出子域字段不在url中的记录:
subdomains_not_in_urls = domains_in_urls[~domains_in_urls.subdomain.isin(urls)]
And replace subdomain field with the domain field for those indexes in original dataframe: 并将subdomain字段替换为原始数据帧中那些索引的domain字段:
df.loc[subdomains_not_in_urls.index, 'subdomain'] = \
df.loc[subdomains_not_in_urls.index, 'domain']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.