I have a df column containing various links, some of them containing the string "search"
.
I want to create a function that - being applied to the column - returns a column containing "search"
or "other"
.
I write a function like:
search = 'search'
def page_type(x):
if x.str.contains(search):
return 'Search'
else:
return 'Other'
df['link'].apply(page_type)
but it gives me an error like:
AttributeError: 'unicode' object has no attribute 'str'
I guess I'm missing something when calling the str.contains().
I think you need numpy.where
:
df = pd.DataFrame({'link':['search','homepage d','login dd', 'profile t', 'ff']})
print (df)
link
0 search
1 homepage d
2 login dd
3 profile t
4 ff
search = 'search'
profile = 'profile'
homepage = 'homepage'
login = "login"
def page_type(x):
if search in x:
return 'Search'
elif profile in x:
return 'Profile'
elif homepage in x:
return 'Homepage'
elif login in x:
return 'Login'
else:
return 'Other'
df['link_new'] = df['link'].apply(page_type)
df['link_type'] = np.where(df.link.str.contains(search),'Search',
np.where(df.link.str.contains(profile),'Profile',
np.where(df.link.str.contains(homepage), 'Homepage',
np.where(df.link.str.contains(login),'Login','Other'))))
print (df)
link link_new link_type
0 search Search Search
1 homepage d Homepage Homepage
2 login dd Login Login
3 profile t Profile Profile
4 ff Other Other
Timings :
#[5000 rows x 1 columns]
df = pd.DataFrame({'link':['search','homepage d','login dd', 'profile t', 'ff']})
df = pd.concat([df]*1000).reset_index(drop=True)
In [346]: %timeit df['link'].apply(page_type)
1000 loops, best of 3: 1.72 ms per loop
In [347]: %timeit np.where(df.link.str.contains(search),'Search', np.where(df.link.str.contains(profile),'Profile', np.where(df.link.str.contains(homepage), 'Homepage', np.where(df.link.str.contains(login),'Login','Other'))))
100 loops, best of 3: 11.7 ms per loop
.str
applies to the whole Series but here you are dealing with the value inside the Series.
You can either do : df['link'].str.contains(search)
Or like you want : df['link'].apply(lambda x: 'Search' if search in x else 'Other')
Edit
More generic way:
def my_filter(x, val, c_1, c_2):
return c_1 if val in x else c_2
df['link'].apply(lambda x: my_filter(x, 'homepage', 'homepage', 'other'))
You can use also a list comprehesion
if you want to find the word search within a link:
Fo example:
df['Search'] = [('search' if 'search' in item else 'other') for item in df['link']]
The output:
ColumnA link Search
0 a http://word/12/word other
1 b https://search-125.php search
2 c http://news-8282.html other
3 d http://search-hello-1.html search
Create function:
def page_type(x, y):
df[x] = [('search' if 'search' in item else 'other') for item in df[y]]
page_type('Search', 'link')
In [6]: df
Out[6]:
ColumnA link Search
0 a http://word/12/word other
1 b https://search-125.php search
2 c http://news-8282.html other
3 d http://search-hello-1.html search
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.