I have a pandas series that contains rows of share names amongst other details:
Netflix DIVIDEND
Apple Inc (All Sessions) COMM
Intel Corporation CONS
Correction Netflix Section 31 Fee
I'm trying to use a regex to retrieve the stock name, which I did with this look ahead:
transactions_df["Share Name"] = transactions_df["MarketName"].str.extract(r"(^.*?(?=DIVIDEND|\(All|CONS|COMM|Section))")
The only thing I'm having trouble with is the row Correction Netflix Section 31 Fee
, where my regex is getting the sharename as Correction Netflix
. I don't want the word "Correction".
I need my regular expression to check for either the start of the string, OR the word "Correction ".
I tried a few things, such as an OR |
with the start of string character ^
. I also tried a look behind to check for ^
or Correction
but the error says they need to be constant length.
r"((^|Correction ).*?(?=DIVIDEND|\(All|CONS|COMM|Section))"
gives an error; ValueError: Wrong number of items passed 2, placement implies 1
. I'm new to regex so I don't really know what this means.
You could use an optional part, and in instead of lookarounds use a capture group with a match:
^(?:Correction\s*)?(\S.*?)\s*(?:\([^()]*\)|DIVIDEND|All|CONS|COMM|Section)
^
Start of string (?:Correction\s*)?
(\S.*?)\s*
Capture in group 1 , matching a non whitespace char and as least chars as possible and match (not capture) 0+ whitespace chars (?:
Non capture group for the alternation |
\([^()]*\)
Match from (
till )
|
OrDIVIDEND|All|CONS|COMM|Section
Match any of the words )
Close group data = ["Netflix DIVIDEND", "Apple Inc (All Sessions) COMM", "Intel Corporation CONS", "Correction Netflix Section 31 Fee"]
pattern = r"^(?:Correction\s*)?(\S.*?)\s*(?:\([^()]*\)|DIVIDEND|All|CONS|COMM|Section)"
transactions_df = pd.DataFrame(data, columns = ['MarketName'])
transactions_df["Share Name"] = transactions_df["MarketName"].str.extract(pattern)
print(transactions_df)
Output
0 Netflix DIVIDEND Netflix
1 Apple Inc (All Sessions) COMM Apple Inc
2 Intel Corporation CONS Intel Corporation
3 Correction Netflix Section 31 Fee Netflix
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.