How to extract specific text from a pandas column

Question

I have a pandas dataframe with a column, which I need to clean, because the data doesn't have the necessary format:

df = pd.DataFrame({'item': ["1","2","3","4","5","6"], 'store': ["a [note 3]","b  [note 98]","c ","a 
[note 222]","b","c"]})
print(df)

item         store
0    1    a [note 3]
1    2  b  [note 98]
2    3            c 
3    4  a [note 222]
4    5             b
5    6             c

The column, 'store' , must be changed like this:

 item store
0    1     a
1    2     b
2    3     c
3    4     a
4    5     b
5    6     c

Answer 1

Split by the opening square bracket and pick first index value in the resulting list.

df['store'] = df.store.str.split('\[').str[0]

Answer 2

You don't need a regular expression. Just split on the space and take the first character.

df['store'] = df['store'].apply(lambda x: x.split()[0])

If you end up needing regex, you can use extract

df['store'] = df['store'].str.extract('^([a-z])')

If you have multiple characters before the bracket

df['store'] = df['store'].str.extract('^(.+?)(?=\[|$)')

How to extract specific text from a pandas column

Question

2 answers

solution1
3 ACCPTED 2020-12-15 23:42:53

solution2
2 2020-12-15 23:33:14

How to extract specific text from a pandas column

Question

2 answers

solution1 3 ACCPTED 2020-12-15 23:42:53

solution2 2 2020-12-15 23:33:14

solution1
3 ACCPTED 2020-12-15 23:42:53

solution2
2 2020-12-15 23:33:14