简体   繁体   中英

Python - get string after specific character from inverse

I am trying to capture the domains of these email list. I have sub domains in the email and trying to remove it. I just need a string before and after '.' from backwards

ids = [1,2,3,4,5,6,7,8]
emails = ['gmail.com','aol.com','','123.abc.edu','123.er.abc.edu','','abc.gov','test.net']
df = pd.DataFrame({'ids':ids,'emails':emails})
df

ids emails
0   1   gmail.com
1   2   aol.com
2   3   
3   4   123.abc.edu
4   5   123.er.abc.edu
5   6   
6   7   abc.gov
7   8   test.net

Tried this and combinations of -1, 2:...etc

df.emails.str.split(".", 1).str[-1]

0           com
1           com
2              
3       abc.edu
4    er.abc.edu
5              
6           gov
7           net

Need output like this one

ids emails
0   1   gmail.com
1   2   aol.com
2   3   
3   4   abc.edu
4   5   abc.edu
5   6   
6   7   abc.gov
7   8   test.net

By passing 1 as the second argument to split() you limit the split to one.

Use instead:

df.emails.str.split(".").str[-2:]

to obtain the last two segments of the split string:

0    [gmail, com]
1      [aol, com]
2              []
3      [abc, edu]
4      [abc, edu]
5              []
6      [abc, gov]
7     [test, net]

To get the output as a string including the dot, chain a method to join the previous output:

In []: df.emails.str.split(".").str[-2:].str.join(".")
Out[]: 
0    gmail.com
1      aol.com
2             
3      abc.edu
4      abc.edu
5             
6      abc.gov
7     test.net
Name: emails, dtype: object

You can pre process emails list

emails = ['gmail.com','aol.com','','123.abc.edu','123.er.abc.edu','','abc.gov','test.net']

emails_filtered = []
for email in emails:
    if '.' in email:
        emails_filtered.append( '.'.join( [ email.split('.')[:-2] ] ) )
    else:
        emails_filtered.append('')

df = pd.DataFrame({'ids':ids,'emails':emails_filtered})

hope it helps.

尝试这个

df.emails.str.split(".").str[-2:].str.join(sep='.')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM