I am trying to extract 2 numbers (Depth From and Depth To) separated by hyphen in a dataframe column (name Depth). While the first number is extracted correctly, the second number is not. I have tried many methods.
ConvCore = pd.read_csv(r'ConvCore.csv', encoding='cp1252')
ConvCore.columns = ['Depth', 'k', 'phi', 'Well']
ConvCore['DepthFrom'] = ConvCore['Depth'].str.extract('([0-9.]+)')
#ConvCore['DepthTo'] = ConvCore['Depth'].str.extract('-([0-9.]+)')
#for i in ConvCore:
#ConvCore['DepthTo'] = re.search(r'(\d+)-', ConvCore['Depth'][i-1])
#ConvCore['DepthFrom'] = ConvCore['Depth'].str.extract('(\d+)').astype(float)
#DepthTo = ConvCore['Depth'].str.extract('(?P<digit1>[0123456789])').astype(float)
#ConvCore['DepthTo'] = ConvCore['Depth'].str.split("-")
#ConvCore['DepthFrom'] = re.match(r'(\d+)', ConvCore['Depth']).group()
Try this way:
ConvCore['DepthFrom'] = ConvCore['Depth'].str.split("-", expand=True)[0]
ConvCore['DepthTo'] = ConvCore['To'].str.split("-", expand=True)[1]
You can split the values and then assign the new values to the dataframe. I used a sample dataset to simulate your scenario,
In [4]: df = pd.DataFrame({'num_legs': ['20-30', '40-60', '80-90', '0-10'],
...:
...: 'num_wings': [2, 0, 0, 0],
...:
...: 'num_specimen_seen': [10, 2, 1, 8]},
...:
...: index=['falcon', 'dog', 'spider', 'fish'])
In [5]: ndf = pd.DataFrame(df.num_legs.str.split('-').tolist(), columns = ['x1', 'x2'])
In [6]: df[ ndf.columns ] = ndf.values
In [7]: df
Out[7]:
num_legs num_wings num_specimen_seen x1 x2
falcon 20-30 2 10 20 30
dog 40-60 0 2 40 60
spider 80-90 0 1 80 90
fish 0-10 0 8 0 10
So in your case, the code should be somthing like,
ndf = pd.DataFrame(ConvCore.Depth.str.split('-').tolist(), columns = ['DepthFrom', 'DepthTo'])
ConvCore[ ndf.columns ] = ndf.values
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.