I have a pandas dataframe with a column, which I need to clean, because the data doesn't have the necessary format:
df = pd.DataFrame({'item': ["1","2","3","4","5","6"], 'store': ["a [note 3]","b [note 98]","c ","a
[note 222]","b","c"]})
print(df)
item store
0 1 a [note 3]
1 2 b [note 98]
2 3 c
3 4 a [note 222]
4 5 b
5 6 c
The column, 'store'
, must be changed like this:
item store
0 1 a
1 2 b
2 3 c
3 4 a
4 5 b
5 6 c
Split by the opening square bracket and pick first index value in the resulting list.
df['store'] = df.store.str.split('\[').str[0]
You don't need a regular expression. Just split on the space and take the first character.
df['store'] = df['store'].apply(lambda x: x.split()[0])
If you end up needing regex, you can use extract
df['store'] = df['store'].str.extract('^([a-z])')
If you have multiple characters before the bracket
df['store'] = df['store'].str.extract('^(.+?)(?=\[|$)')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.