简体   繁体   中英

How to extract stock code NUMBER from news summary?

I have a Pandas table and need to extract the stock code '00981', '00823' from text stored in a column. The code is in the (00000) format. The code would be located at different location in the text summary. Please advice.

News
1 example(00981)example example example。 
2 example example example (00823)text text text 

desired output:

Code column
981
823

s = TABLE['News'].str.find('(')
e = s + 5
c = TABLE['News'].str[s:e]
TABLE["Code"] = c

This will find all occurrences of 5 digits surrounded by parentheses:

import re

x = re.findall('\(\d{5}\)', my_string)

This works for me:

print(df)
           News
0          1 example(00981)example example example。 
1      2 example example example (00823)text text...
-
 df['stock_num'] = df['News'].str.extract('(\d{5})').astype(int) print(df) News stock_num 0 1 example(00981)example example example。 981 1 2 example example example (00823)text text... 823

to change the string into a number you can either leverage the .astype() method or pd.to_numeric(df['stock_number'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM