简体   繁体   中英

Indexing a dataframe column in Pandas

How can I index values in a column in pandas and make it into a new column? This is what I'm trying to do:

Original:

       Data    
0  0010-AAAA    
1  0010-BBBB    
2  0010-CCCC    
3  0011-DDDD    
4  0011-EEEE    

Adding two columns:
       Data    col_2   col_3  
0  0010-AAAA    0010    AAAA
1  0010-BBBB    0010    BBBB
2  0010-CCCC    0010    CCCC
3  0011-DDDD    0011    DDDD
4  0011-EEEE    0011    EEEE

Looks like you need a split :

df[['col_2', 'col_3']] = df['Data'].str.split('-', n=1, expand=True)

output:

        Data col_2 col_3
0  0010-AAAA  0010  AAAA
1  0010-BBBB  0010  BBBB
2  0010-CCCC  0010  CCCC
3  0011-DDDD  0011  DDDD
4  0011-EEEE  0011  EEEE

no dash

Then use a regex with str.extract .

In this case: numbers \d+ , followed by non numbers \D+ :

df[['col_2', 'col_3']] = df['Data'].str.extract(r'(\d+)(\D+)')

output:

       Data col_2 col_3
0  0010AAAA  0010  AAAA
1  0010BBBB  0010  BBBB
2  0010CCCC  0010  CCCC
3  0011DDDD  0011  DDDD
4  0011EEEE  0011  EEEE

or even: r'(\d+)\W*(\D+)' (digits / optional non-alphanum / non-digits) to handle both cases at once:

df[['col_2', 'col_3']] = df['Data'].str.extract(r'(\d+)\W*(\D+)')

example:

        Data col_2 col_3
0  0010-AAAA  0010  AAAA
1   0010BBBB  0010  BBBB
2  0010-CCCC  0010  CCCC
3   0011DDDD  0011  DDDD
4  0011-EEEE  0011  EEEE
df[['col_2','col_3']]= df['Data'].str.split("-",expand=True)
df
Data    col_2   col_3
0   0010-AAAA   0010    AAAA
1   0010-BBBB   0010    BBBB
2   0010-CCCC   0010    CCCC
3   0011-DDDD   0011    DDDD
4   0011-EEEE   0011    EEEE

All the python string methods will work, including slicing

df["Data"].str[:4]

0    0010
1    0010
2    0010
3    0011
4    0011

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM