简体   繁体   中英

How can i solve my this regex logical error?

Code -

df['Expiry'], df['Symbol'] = None, None
index_Ticker = df.columns.get_loc('Ticker')
index_Expiry = df.columns.get_loc('Expiry')
index_Symbol = df.columns.get_loc('Symbol')
            
Expiry_Pattern = r'-([A-Z]{1,3})'
Symbol_Pattern = r'(.*?)-[A-Z]{1,3}'
            
for row in range(0, len(df)):
    Expiry = re.search(Expiry_Pattern, df.iat[row, index_Ticker]).group()
    df.iat[row, index_Expiry] = Expiry
    Symbol = re.search(Symbol_Pattern, df.iat[row, index_Ticker]).group()
    df.iat[row, index_Symbol] = Symbol

here I'm using this regex

Expiry_Pattern = r'-([A-Z]{1,3})'
Symbol_Pattern = r'(.*?)-[A-Z]{1,3}'

And my output is - Output Image

And My actual data is in this format -

ZEEL-III.NFO
RELIANCE-III.NFO
ADANIPORTS-I.NFO
ZEEL-II.
AARTIIND-III.NFO

but I want output -

ZEEL         III
RELIANCE     III
ADANIPORTS   I
ZEEL         II
AARTIIND     III

I don't understand how can I solve this issue.

You can use the regex '-?(\\w+)(?=-|\\.)' to get the expected output for the sample data you have:

>>> df['col'].str.findall('-?(\w+)(?=-|\.)').apply(pd.Series)

            0    1
0        ZEEL  III
1    RELIANCE  III
2  ADANIPORTS    I
3        ZEEL   II
4    AARTIIND  III`

Pattern Explanation :

'-?(\\w+)(?=-|\\.)'

  • -? will match one or zero occurrence of hyphen - in the beginning
  • (\\w+) captures the word/substring
  • (?=-|\\.) is positive lookahead to make sure it ends with - or .

The Non-regex solution:

Right split the string first on . with maxsplit n as 1, then take the value at first index, and split it on - :

df['col'].str.rsplit('.', n=1).str[:-1].str[0].str.split('-').apply(pd.Series)
            0    1
0        ZEEL  III
1    RELIANCE  III
2  ADANIPORTS    I
3        ZEEL   II
4    AARTIIND  III

I extract value -

df["Symbol"] = df["Ticker"].str.extract('(.*?)-').apply(pd.Series)
df["Expiry"] = df["Ticker"].str.extract('-([A-Z]{1,3})').apply(pd.Series)

and create two columns.

now my Output is also the same as I want. Output Image

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM