If row value contains items from a list as substrings, save row value to a different dataframe.
INPUT DATAFRAME:
index link
1 https://zeewhois.com/en/
2 https://www.phpfk.de/domain
3 https://www.phpfk.de/de/domain
4 https://laseguridad.online/questions/1040/pued
list=['verizon','zeewhois','idad']
If df['link'] has any item of list
as a substring, we need to put that specific link
in a different new data frame.
So far, I've preprocessed the link
column and bought to this format:
index link
1 httpszeewhoiscomenwww
2 httpswwwphpfkdedomain
3 httpswwwphpfkdededomain
4 httpslaseguridadonlinequestions1040pued
to find which rows values contain the items from list
as substring df["TRUEFALSE"] = df['link'].apply(lambda x: 1 if any(i in x for i in list) else 0)
but I'm getting the error:
TypeError: 'in <string>' requires string as left operand, not float
You could use str.contains
list_strings =['verizon','zeewhois','idad']
df.loc[df.link.str.contains('|'.join(list_strings),case=False), 'TRUE_FALSE'] = True
index link TRUE_FALSE
1 https://zeewhois.com/en/ True
2 https://www.phpfk.de/domain NaN
3 https://www.phpfk.de/de/domain NaN
4 https://laseguridad.online/questions/1040/pued True
then just filter for True to get your new dataframe
new_df = df.loc[df.TRUE_FALSE == True].copy()
index link TRUE_FALSE
1 https://zeewhois.com/en/ True
4 https://laseguridad.online/questions/1040/pued True
You don't need to process the link
. Can simply do something like this:
In [51]: import numpy as np
In [47]: df
Out[47]:
link
index
1 https://zeewhois.com/en/
2 https://www.phpfk.de/domain
3 https://www.phpfk.de/de/domain
4 https://laseguridad.online/questions/1040/pued
l =['verizon','zeewhois','idad'] ## It's not nice to have variable with names like list,dict etc.
In [50]: def match(x):
...: for i in l:
...: if i.lower() in x.lower():
...: return i
...: else:
...: return np.nan
...:
In [48]: new_df = df[df['link'].apply(match).notna()]
In [49]: new_df
Out[49]:
link
index
1 https://zeewhois.com/en/
4 https://laseguridad.online/questions/1040/pued
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.