I have a column, market_area
that I want to abbreviate by keeping only the part of the string to the left of the hyphen.
For example, my data is like this:
import pandas as pd
tmp = pd.DataFrame({'market_area': ['San Francisco-Oakland-San Jose',
None,
'Dallas-Fort Worth',
'Los Angeles-Riverside-Orange County'],
'val': [1,2,3,4]})
My desired output would be:
['San Francisco', None, 'Dallas', 'Los Angeles']
I am able to split based on the hyphen:
tmp['market_area'].str.split('-')
But how do I extract only the part to the left of the hyphen?
You can extract the first element in the splitted list using .str[0]
:
tmp.market_area.str.split('-').str[0]
Out[3]:
0 San Francisco
1 None
2 Dallas
3 Los Angeles
Name: market_area, dtype: object
Or use str.extract
method with regex ^([^-]*).*
, which captures the pattern until the first -
:
tmp.market_area.str.extract('^([^-]*).*', expand=False)
Out[5]:
0 San Francisco
1 NaN
2 Dallas
3 Los Angeles
Name: market_area, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.