Python - Regex split data in Dataframe

Question

I have a column containing values. I want to split it based on a regex. If the regex matches, the original value will be replaced with the left-side of the split. A new column will contain the right-side of a split.

Below is some sample code. I feel I am close but it isn't quite working.

import pandas as pd
import re

df = pd.DataFrame({ 'A' : ["test123","foo"]})

// Regex example to split it if it ends in numbers
r = r"^(.+?)(\d*)$"

df['A'], df['B'] = zip(*df['A'].apply(lambda x: x.split(r, 1)))
print(df)

In the example above I would expect the following output

         A        B
0     test      123
1     foo

I am fairly new to Python and assumed this would be the way to go. However, it appears that I haven't quite hit the mark. Is anyone able to help me correct this example?

Answer 1

Just base on your own regex

df.A.str.split(r,expand=True).replace('',np.nan).dropna(thresh=1,axis=1).fillna('')
Out[158]: 
      1    2
0  test  123
1   foo     


df[['A','B']]=df.A.str.split(r,expand=True).replace('',np.nan).dropna(thresh=1,axis=1).fillna('')
df
Out[160]: 
      A    B
0  test  123
1   foo

Answer 2

Your regex is working just fine, use it with str.extract

df = pd.DataFrame({ 'A' : ["test123","foo", "12test3"]})
df[['A', 'B']] = df['A'].str.extract("^(.+?)(\d*)$", expand = True)


    A       B
0   test    123
1   foo 
2   12test  3

Answer 3

def bar(x):
    els = re.findall(r'^(.+?)(\d*)$', x)[0]
    if len(els):
        return els
    else:
        return x, None


def foo():
    df = pd.DataFrame({'A': ["test123", "foo"]})
    df['A'], df['B'] = zip(*df['A'].apply(bar))
    print(df)

result:

      A    B
0  test  123
1   foo

Python - Regex split data in Dataframe

Question

3 answers

solution1
3 2018-01-05 19:23:20

solution2
2 ACCPTED 2018-01-05 19:34:02

solution3
0 2018-01-05 19:26:29

Python - Regex split data in Dataframe

Question

3 answers

solution1 3 2018-01-05 19:23:20

solution2 2 ACCPTED 2018-01-05 19:34:02

solution3 0 2018-01-05 19:26:29

solution1
3 2018-01-05 19:23:20

solution2
2 ACCPTED 2018-01-05 19:34:02

solution3
0 2018-01-05 19:26:29