简体   繁体   中英

Python - Regex split data in Dataframe

I have a column containing values. I want to split it based on a regex. If the regex matches, the original value will be replaced with the left-side of the split. A new column will contain the right-side of a split.

Below is some sample code. I feel I am close but it isn't quite working.

import pandas as pd
import re

df = pd.DataFrame({ 'A' : ["test123","foo"]})

// Regex example to split it if it ends in numbers
r = r"^(.+?)(\d*)$"

df['A'], df['B'] = zip(*df['A'].apply(lambda x: x.split(r, 1)))
print(df)

In the example above I would expect the following output

         A        B
0     test      123
1     foo

I am fairly new to Python and assumed this would be the way to go. However, it appears that I haven't quite hit the mark. Is anyone able to help me correct this example?

Just base on your own regex

df.A.str.split(r,expand=True).replace('',np.nan).dropna(thresh=1,axis=1).fillna('')
Out[158]: 
      1    2
0  test  123
1   foo     


df[['A','B']]=df.A.str.split(r,expand=True).replace('',np.nan).dropna(thresh=1,axis=1).fillna('')
df
Out[160]: 
      A    B
0  test  123
1   foo     

Your regex is working just fine, use it with str.extract

df = pd.DataFrame({ 'A' : ["test123","foo", "12test3"]})
df[['A', 'B']] = df['A'].str.extract("^(.+?)(\d*)$", expand = True)


    A       B
0   test    123
1   foo 
2   12test  3
def bar(x):
    els = re.findall(r'^(.+?)(\d*)$', x)[0]
    if len(els):
        return els
    else:
        return x, None


def foo():
    df = pd.DataFrame({'A': ["test123", "foo"]})
    df['A'], df['B'] = zip(*df['A'].apply(bar))
    print(df)

result:

      A    B
0  test  123
1   foo   

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM