I have a function which parses PHP array declarations from files. The function then returns a dictionary with the keys being the keys of the PHP array and the values in python are the values from the PHP array.
$lang['identifier_a'] = 'Welcome message';
$lang['identifier_b'] = 'Welcome message.
You can do things a,b, and c here.
Please be patient.';
$lang['identifier_c'] = 'Welcome message2.
You can do things a,b, and c here.
Please be patient.';
$lang['identifier_d'] = 'Long General Terms and Conditions with more text';
$lang['identifier_e'] = 'General Terms and Conditions';
$lang['identifier_f'] = 'Text e';
def fetch_lang_keys(filename):
from re import search;
import mmap;
''' fetches all the language keys for filename '''
with open(filename) as fi:
lines = fi.readlines();
data = {};
for line in lines:
obj = search("\$lang\[[\'|\"](.{1,})[\'|\"]\] = [\'|\"](.{1,})[\'|\"];", line);
# re.match(r'''\$lang\[[\'|\"](.{1,})[\'|\"]\] = [\'|\"](.{1,})[\'|\"];''', re.MULTILINE | re.VERBOSE);
if obj:
data[obj.group(1)] = obj.group(2);
return data;
This function should return a dictionary which should look like this:
data['identifier_a'] = 'Welcome message'
data['identifier_b'] = 'Welcome message.
You can do things a,b, and c here.
Please be patient.';
// and so on
The regexp which is used in the function works for everything except for identifier_b and identifier_c , because the regular expression does not match blank lines and/or lines which do not end with ;. The wildcard operator with ; at the end did work either, because it matched too much.
Do you have any idea of how to solve this? I looked into lookahead assertions, but failed to use them properly. Thanks.
Well, why my answer is not a solution for your regexp problem, but nevertheless: why don't you wish to use a "real PHP parser" instead of home-brew regexp's? It could be much more reliable and might even be faster, and certainly a more maintainable solution.
Quick googling gave me: https://github.com/ramen/phply . But also I've found this: Parse PHP file variables from Python script . Hope this help.
It doesn't work because the dot doesn't match newlines. You must use the singleline modifier ( re.DOTALL
) instead of the multiline modifier. Example:
obj = re.search(r'\$lang\[[\'"](.+?)[\'"]\] = [\'"](.+?)[\'"];', line, re.DOTALL);
This regex seems to work. -
\$lang\[[\'|\"](.{1,})[\'|\"]\] = [\'|\"]((?:.|\n)+?)[\'|\"];
^^^^^^^^^^
Demo here-
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.