简体   繁体   中英

How to extract specific information from pandas column?

This is the posdf :

      tradingsymbol
0     XYZ2061820500PE
1     XYZ20JUN21000PE
2     ABC20JUN100CE
3    ABC20JUN102.5PE
4     ABC20JUN92.5PE
4     XYZ20JUNFUT

I am doing this to extract the ABC and XYZ to a column:

posdf['symbol'] = posdf['tradingsymbol'].str.extract('^(\D+)', expand=True)

I cannot figure out how I can make a generalised way to extract the following columns:

     strike    type   Expiry
0    20500     PE     20618
1    21000     PE     20JUN
2    100       CE     20JUN
3    102.5     PE     20JUN
4    92.5      PE     20JUN
4    NA        FUT    20JUN

Edit

type is min 2 chars max 3. Expiry is always 5 chars. Which could possibly have this form: 20O18 or 20N18 or 20D18 .

2nd Edit

Adding rows where type can be 3 chars based on Sammy's comment.

Use, Series.str.extract with a given regex pattern:

df1 = df['tradingsymbol'].str.extract(
    r'(?P<expiry>\d{5}|\d{2}\w{3})(?P<strike>\d+(?:\.\d+)?)?(?P<type>\w+)')
df1 = df1[['strike', 'type', 'expiry']]

Result:

# print(df1)
  strike type expiry

0  20500   PE  20618
1  21000   PE  20JUN
2    100   CE  20JUN
3  102.5   PE  20JUN
4   92.5   PE  20JUN
4    NaN  FUT  20JUN

You can test the regex here .

if Strike is always numerical then you can do:

posdf[['Symbol','Expiry','Strike','Type']] = posdf['tradingsymbol'].str.extract('^(\D+)(.{5})([0-9.]*)([a-zA-Z]{2,3})', expand=True)

Result:

     tradingsymbol Symbol Expiry Strike Type
0  XYZ2061820500PE    XYZ  20618  20500   PE
1  XYZ20JUN21000PE    XYZ  20JUN  21000   PE
2    ABC20JUN100CE    ABC  20JUN    100   CE
3  ABC20JUN102.5PE    ABC  20JUN  102.5   PE
4   ABC20JUN92.5PE    ABC  20JUN   92.5   PE
4      XYZ20JUNFUT    XYZ  20JUN         FUT

Bit of a hack:

res = (df.assign(Expiry = df.tradingsymbol.str[3:8],
                 type = df.tradingsymbol.str[8:].str.split("([a-zA-Z]+)").str[1],
                 strike = df.tradingsymbol.str[8:].str.split("[a-zA-Z]+").str[0],
                )
      )

res


   tradingsymbol    Expiry  type    strike
0   XYZ2061820500PE 20618   PE      20500
1   XYZ20JUN21000PE 20JUN   PE      21000
2   ABC20JUN100CE   20JUN   CE      100
3   ABC20JUN102.5PE 20JUN   PE      102.5
4   ABC20JUN92.5PE  20JUN   PE      92.5
4   XYZ20JUNFUT     20JUN   FUT 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM