简体   繁体   中英

How do I split a dataframe column values in pandas to get another column using python?

I have created a dataframe like this from a list.

                                        Name
0     Security Name % to Net Assets* DEBENtURES 0.04
1   Britannia Industries Ltd. EQUity & RELAtED 96.83
2                                     HDFC Bank 6.98
3                                         ICICI 4.82
4                                       Infosys 4.37

using df = pd.DataFrame(list1, columns=['Name']) I want to split Name in another such that the dataframe looks like this:

                                        Name        value
0     Security Name % to Net Assets* DEBENtURES      0.04
1   Britannia Industries Ltd. EQUity & RELAtED      96.83
2                                     HDFC Bank      6.98
3                                         ICICI      4.82
4                                       Infosys      4.37

I tried doing something like df = pd.DataFrame(df.row.str.split(",", 1).tolist(), columns=["Security Name", "Weights"]) however it gave an error like AttributeError: 'DataFrame' object has no attribute 'row'

How exactly do I split a column such that I get another column? Please help

There is no column row and no separator comma, so use Series.str.rsplit for split from right with n=1 for first space:

print (df.columns.tolist())
Name


df[['Name','value']] = df['Name'].str.rsplit(n=1, expand=True)
print (df)
                                         Name  value
0   Security Name % to Net Assets* DEBENtURES   0.04
1  Britannia Industries Ltd. EQUity & RELAtED  96.83
2                                   HDFC Bank   6.98
3                                       ICICI   4.82
4                                     Infosys   4.37

If possible some numbers missing is possible use Series.str.extract with match numbers after last space in regex:

print (df)
                                               Name
0    Security Name % to Net Assets* DEBENtURES 0.04
1  Britannia Industries Ltd. EQUity & RELAtED 96.83
2                                    HDFC Bank 6.98
3                                        ICICI 4.82
4                                      Infosys 4.37
5                                               new
6                                             aa 58

df = df['Name'].str.extract('(?P<Name>.*)\s+(?P<value>\d+\.\d+|\d+)$')
print (df)
                                         Name  value
0   Security Name % to Net Assets* DEBENtURES   0.04
1  Britannia Industries Ltd. EQUity & RELAtED  96.83
2                                   HDFC Bank   6.98
3                                       ICICI   4.82
4                                     Infosys   4.37
5                                         NaN    NaN
6                                          aa     58

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM