I have a dataset which has a specific column containing strings in the format of: Building = Building_A and Floor = Floor_4 Building = Building_D and Floor = Floor_2
I would like to extract only the building and floor names, concatenated into a single string / new column. Eg Building_A/Floor_4 Building_D/Floor_2
I've spent about an hour looking through previous posts and was not able to find something to match what I am trying to do. Any help would be appreciated.
Assume we have dataframe df
:
import pandas as pd
df = pd.DataFrame({'txt': ["Building = Building_A and Floor = Floor_4",\
"Building = Building_Z and Floor = Floor_9",\
"Building = Martello and Floor = Ground"]})
First define pattern to extract:
pat = "(Floor_\d+)|(Building_\w{1})"
Alternatively if You look for all words after "= "
:
pat = r"(?<== )(\w+)"
Please note lookbehind (?<=)
in pattern definition.
Then apply lambda function to column txt
:
df['txt_extract'] = \
df[['txt']].apply(lambda r: "/".join(r.str.extractall(pat).stack()), axis=1)
Result:
0 Building_A/Floor_4
1 Building_Z/Floor_9
2 Martello/Ground
Instead of str.extract
use str.extractall
which looks for all occurences of pattern. Resulting searches are stacked and joined with "/"
separator. Please note that order of patterns found is preserved what may be important in Your case.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.