简体   繁体   中英

Python extract text from javascript using regex

I have some javascript text:

NCIA.username = 'filler@school.edu'; 
NCIA.user_id = '5bad4c16260c175e8660ae19'; 
NCIA.user_rights = '1'*1; 
if (empty(NCIA.lti_info) || NCIA.lti_info.valid_connection == false) NCIA.catalog_cookie=true; 
NCIA.alias_activity_id='';
NCIA.activity_id='560a8cc65e4ef62276c1a2f0';

I'd like to use regex to extract the values for NCIA.username and NCIA.activity_id. Is there a good way to extract both?

This should do what you want. The regex looks for NCIA at the start of the line (or possibly after some whitespace); a . ; one of username or activity_id ; followed by = , possibly surrounded by whitespace; and finally a value inside single quotes:

import re
js = """NCIA.username = 'filler@school.edu'; 
NCIA.user_id = '5bad4c16260c175e8660ae19'; 
NCIA.user_rights = '1'*1; 
if (empty(NCIA.lti_info) || NCIA.lti_info.valid_connection == false) NCIA.catalog_cookie=true; 
NCIA.alias_activity_id='';
NCIA.activity_id='560a8cc65e4ef62276c1a2f0';"
"""
regex = re.compile('^\s*NCIA\.(username|activity_id)\s*=\s*\'([^\']+)\';', re.MULTILINE)
print regex.findall(js)

Output

[('username', 'filler@school.edu'), ('activity_id', '560a8cc65e4ef62276c1a2f0')]

Demo on rextester

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM