I have a dataset called data. Theres a column called networkDomain that looks like this, data['networkDomain']:
0 amazonaws.com
1 vodafone-ip.de
2 ask4internet.com
3 actcorp.in
4 (not set)
5 (not set)
6 druknet.bt
7 unknown.unknown
8 alliancebroadband.in
9 vsnl.net.in
10 grandenetworks.net
11 superonline.net
12 (not set)
13 unknown.unknown
14 unknown.unknown
15 fidnet.com
16 (not set)
17 telepacific.net
18 pldt.net
19 networkbackup.com.au
I would like to filter all the values ending with '.com' or '.net' using regex and assign all other values as 0.
I've tried data['networkDomain'][data['networkDomain'].str.contains(".com$|.net$", regex=True)] which returns:
0 amazonaws.com
2 ask4internet.com
10 grandenetworks.net
11 superonline.net
15 fidnet.com
17 telepacific.net
18 pldt.net
22 tdc.net
24 qwest.net
26 hinet.net
27 ztomy.com
29 netvigator.com
30 level3.net
31 virginm.net
32 rr.com
41 sbcglobal.net
49 pldt.net
51 1asiacom.net
56 yesup.net
59 btireland.net
60 avast.com
How can I set all the other values in data[networkDomain] which aren't '.net' or '.com' to be 0?
You can use DataFrame.apply
, which will apply a function along an axis of the DataFrame
.
>>> import re
>>> import pandas as pd
>>> regex = re.compile(r".com$|.net$")
>>>
>>> def my_func(row):
... if regex.search(row):
... return row
... return 0 # default
...
>>> df = pd.DataFrame(
... [
... {"Domain": " amazonaws.com"},
... {"Domain": " amazonaws2.com"},
... {"Domain": " amazonaws.net"},
... {"Domain": "(not set)"},
... ]
... )
>>>
>>> df["Domain"] = df["Domain"].apply(my_func)
>>> print(df)
Domain
0 amazonaws.com
1 amazonaws2.com
2 amazonaws.net
3 0
Determine the row which doesn't satisfy the condition and modify the value of this row
import re
for i, j in enumerate(data.loc[:,'networkDomain']):
if len(re.findall(r'\.com$|\.net$', j))==0:
data.loc[i,'networkDomain'] = 0
print(data)
Use DataFrame.apply() to apply a function to every row in the series, note args argument must be passed as a tuple:
from pandas import DataFrame
import re
d={'col': [1,2,3], 'col2': ['a.net',2,3]}
df=DataFrame(columns=d.keys(), data=d)
def mask0(s, pattern):
s =str(s)
if re.match(pattern, s):
return s
else:
return 0
pat = re.compile('.+[\.net|\.com]')
df['col2'] = df['col2'].apply(mask0, args=(pat,))
print(df)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.