繁体   English   中英

如何在Python字符串中提取字典?

[英]How do I extract a dictionary in a Python string?

example_string = "Bla bla {"value": "1"} bla bla"

有没有一种方法可以搜索字符串,目标是仅提取字典而不使用正则表达式?

我正在尝试在网站的脚本标签中提取字典。

编辑 :我在下面添加了确切的字符串。 不想仅仅因为我不熟悉正则表达式而一直在寻找一种更简单的方法(性能不是问题)。 我用正则表达式尝试了建议的解决方案,但没有成功。

"""        var spConfig = new Product.Config({"attributes":{"6993":{"id":"6993","code":"irs_02010201_asafstand","label":"Asafstand","multiselect":false,"options":[{"id":"289022","label":"72 mm","price":0,"oldPrice":-45.76,"products":["1212907","1280900"]},{"id":"289025","label":"92 mm","price":0,"oldPrice":-45.76,"products":["1280407","1280899","1289700","1289750"]}]},"6994":{"id":"6994","code":"irs_02010201_doornmaat","label":"Doornmaat","multiselect":false,"options":[{"id":"306161","label":"35 mm","price":0,"oldPrice":-45.76,"products":["1280899"]},{"id":"306205","label":"40 mm","price":0,"oldPrice":-45.76,"products":["1289750"]},{"id":"306192","label":"45 mm","price":0,"oldPrice":-45.76,"products":["1289700"]},{"id":"306194","label":"50 mm","price":0,"oldPrice":-45.76,"products":["1280900"]},{"id":"306034","label":"55 mm","price":0,"oldPrice":-45.76,"products":["1212907","1280407"]}]},"6995":{"id":"6995","code":"irs_02010201_krukhoogte","label":"Krukhoogte","multiselect":false,"options":[{"id":"306193","label":"1050 mm","price":0,"oldPrice":-45.76,"products":["1212907","1280407","1280899","1280900","1289700","1289750"]}]},"6996":{"id":"6996","code":"irs_02010201_kruknoot","label":"Kruknoot","multiselect":false,"options":[{"id":"289046","label":"8 mm","price":0,"oldPrice":-45.76,"products":["1212907","1280407","1280899","1280900","1289700","1289750"]}]},"6997":{"id":"6997","code":"irs_02010201_materiaa_eb7faf79","label":"Materiaal voorplaat","multiselect":false,"options":[{"id":"289083","label":"Verzinkt","price":0,"oldPrice":-45.76,"products":["1212907","1280407","1280899","1280900","1289700","1289750"]}]},"6998":{"id":"6998","code":"irs_02010201_uitvoering","label":"Type sluitpunt","multiselect":false,"options":[{"id":"289092","label":"Rolnok","price":0,"oldPrice":-45.76,"products":["1212907","1280407","1280899","1280900","1289700","1289750"]}]},"6999":{"id":"6999","code":"irs_02010201_uitvoering2","label":"Uitvoering","multiselect":false,"options":[{"id":"289111","label":"Standaard","price":0,"oldPrice":-45.76,"products":["1212907","1280407","1280899","1280900","1289700","1289750"]}]},"7000":{"id":"7000","code":"irs_02010201_voorplaat","label":"Voorplaat","multiselect":false,"options":[{"id":"291379","label":"F16","price":0,"oldPrice":0.93,"products":["1212907","1280407","1280899","1280900","1289700","1289750"]}]}},"template":"#{price}\u00a0\u20ac","basePrice":0,"oldPrice":0,"productId":"1347328","chooseText":"Kies een optie","taxConfig":{"includeTax":true,"showIncludeTax":false,"showBothPrices":false,"defaultTax":0,"currentTax":0,"inclTaxTitle":"Incl. BTW"},"extra_attributes":{"additional_cost":{"1212907":0,"1280407":0,"1280899":0,"1280900":0,"1289700":0,"1289750":0},"additional_cost_description":{"1212907":null,"1280407":null,"1280899":null,"1280900":null,"1289700":null,"1289750":null},"inriver_item_art_number":{"1212907":"036553024","1280407":"036553014","1280899":"036353002","1280900":"036503006","1289700":"036453006","1289750":"A50704007"},"inriver_item_package_qty":{"1212907":"0","1280407":"0","1280899":"0","1280900":"0","1289700":"0","1289750":"0"},"inriver_item_sales_unit":{"1212907":"STUK","1280407":"STUK","1280899":"STUK","1280900":"STUK","1289700":"STUK","1289750":"STUK"},"inriver_item_supplier_art_nr":{"1212907":"6-30710-PU-0-1","1280407":"6-32158-05-0-1","1280899":"6-30710-AQ-0-1","1280900":"6-30710-PT-0-1","1289700":"6-32159-01-0-1","1289750":"6-32234-03-0-1"},"level_price":{"1212907":{"id":"54257","product_id":"1212907","a_price":"30.4700","b_price":"29.3400","c_price":"25.7900","d_price":"24.4800","e_price":"46.6900","preh":null,"staffle_a":"1.0000","staffle_b":"10.0000","staffle_c":"50.0000","staffle_d":"100.0000","updated_at":"2018-08-14 21:32:55","_base_price_group":"B","_base_price":"29.3400","_tier_price_options":{"group":{"B":"10.0000","C":"50.0000","D":"100.0000"},"price":{"B":"29.3400","C":"25.7900","D":"24.4800"}},"_final_price_group":"B","_final_price":"29.3400","promo_price":null},"1280407":{"id":"179868","product_id":"1280407","a_price":"34.5300","b_price":"32.2800","c_price":"29.1500","d_price":"26.9000","e_price":"56.9700","preh":null,"staffle_a":"1.0000","staffle_b":"10.0000","staffle_c":"50.0000","staffle_d":"100.0000","updated_at":"2018-08-14 22:18:16","_base_price_group":"B","_base_price":"32.2800","_tier_price_options":{"group":{"B":"10.0000","C":"50.0000","D":"100.0000"},"price":{"B":"32.2800","C":"29.1500","D":"26.9000"}},"_final_price_group":"B","_final_price":"32.2800","promo_price":null},"1280899":{"id":"179601","product_id":"1280899","a_price":"28.3000","b_price":"27.2600","c_price":"23.9600","d_price":"22.9600","e_price":"52.5100","preh":null,"staffle_a":"1.0000","staffle_b":"10.0000","staffle_c":"50.0000","staffle_d":"100.0000","updated_at":"2018-08-14 22:18:16","_base_price_group":"B","_base_price":"27.2600","_tier_price_options":{"group":{"B":"10.0000","C":"50.0000","D":"100.0000"},"price":{"B":"27.2600","C":"23.9600","D":"22.9600"}},"_final_price_group":"B","_final_price":"27.2600","promo_price":null},"1280900":{"id":"179602","product_id":"1280900","a_price":"31.7500","b_price":"30.5800","c_price":"26.8800","d_price":"25.5100","e_price":"48.6700","preh":null,"staffle_a":"1.0000","staffle_b":"10.0000","staffle_c":"50.0000","staffle_d":"100.0000","updated_at":"2018-08-14 22:18:16","_base_price_group":"B","_base_price":"30.5800","_tier_price_options":{"group":{"B":"10.0000","C":"50.0000","D":"100.0000"},"price":{"B":"30.5800","C":"26.8800","D":"25.5100"}},"_final_price_group":"B","_final_price":"30.5800","promo_price":null},"1289700":{"id":"194219","product_id":"1289700","a_price":"32.0700","b_price":"29.8600","c_price":"26.3300","d_price":"24.2200","e_price":"45.7600","preh":null,"staffle_a":"1.0000","staffle_b":"10.0000","staffle_c":"50.0000","staffle_d":"100.0000","updated_at":"2018-08-14 22:25:57","_base_price_group":"B","_base_price":"29.8600","_tier_price_options":{"group":{"B":"10.0000","C":"50.0000","D":"100.0000"},"price":{"B":"29.8600","C":"26.3300","D":"24.2200"}},"_final_price_group":"B","_final_price":"29.8600","promo_price":null},"1289750":{"id":"194253","product_id":"1289750","a_price":"44.9200","b_price":"42.0000","c_price":"37.9200","d_price":"35.0000","e_price":"74.1100","preh":null,"staffle_a":"1.0000","staffle_b":"10.0000","staffle_c":"50.0000","staffle_d":"100.0000","updated_at":"2018-08-14 22:25:57","_base_price_group":"B","_base_price":"42.0000","_tier_price_options":{"group":{"B":"10.0000","C":"50.0000","D":"100.0000"},"price":{"B":"42.0000","C":"37.9200","D":"35.0000"}},"_final_price_group":"B","_final_price":"42.0000","promo_price":null}},"name":{"1212907":"6-30710-PU-0-1 SECURY EUROPA R4 55\/72\/8\/16 1050  ( KRUK )  4 ROLTAPPEN","1280407":"6-32158-05-0-1 EUROPA R4 55-92-8 F16 (1050) (KRUK BED.) 4 ROLNOKKEN","1280899":"6-30710-AQ-0-1 SECURY EUROPA 35-92-8-F16 R4 1050 (KRUK) 4 ROL (EX 036 35 30 01)","1280900":"6-30710-PT-0-1 SECURY EUROPA R4 50\/72\/8\/16 1050 (KRUK) 4 ROL. \/TAND","1289700":"6-32159-01-0-1 GU SECURY EUROPA S 45-92-8 F16 R4 - 1050 - 4 ROLNOKKEN (KRUKBED.)","1289750":"6-32234-03-0-1 GU EUROPA V 40-92-8 F16  R4"},"sku":{"1212907":"DVRDOAAVBC","1280407":"RGUAAAA8Z7","1280899":"RGUAAAALOP","1280900":"RGUAAAALOT","1289700":"SMPCRAAAD4","1289750":"SMPCRAAAGO"},"stock_type":{"1212907":"Stockartikel","1280407":"Stockartikel","1280899":"Stockartikel","1280900":"Stockartikel","1289700":"Stockartikel","1289750":"Bestelartikel"},"lecot_product_badge":{"1212907":null,"1280407":null,"1280899":null,"1280900":null,"1289700":null,"1289750":null},"resources":{"1212907":{"inriver_resource_technischetek":[{"value_id":"4768791","file":"\/1\/5\/157569_036553024_technischetekening_01.jpg","product_id":"1212907","label":"036553024_TechnischeTekening_01.eps","position":"0","disabled":"0","label_default":"036553024_TechnischeTekening_01.eps","position_default":"0","disabled_default":"0"}]},"1280899":{"inriver_resource_technischetek":[{"value_id":"4773546","file":"\/1\/5\/157670_036353002_foto_01.jpg","product_id":"1280899","label":"036353002_foto_01.jpg","position":"0","disabled":"0","label_default":"036353002_foto_01.jpg","position_default":"0","disabled_default":"0"}]},"1280900":{"inriver_resource_technischetek":[{"value_id":"4768957","file":"\/1\/5\/159563_036503006_technischetekening_01.jpg","product_id":"1280900","label":"036503006_TechnischeTekening_01.eps","position":"0","disabled":"0","label_default":"036503006_TechnischeTekening_01.eps","position_default":"0","disabled_default":"0"}]}},"is_special_price":{"1212907":false,"1280407":false,"1280899":false,"1280900":false,"1289700":false,"1289750":false}},"searchable_attributes":["sku","name","inriver_item_art_number","inriver_item_supplier_art_nr"],"stock_data":{"1212907":{"product_id":"1212907","recommended_sales_qty":"1.0000","required_sales_qty":null},"1280407":{"product_id":"1280407","recommended_sales_qty":"5.0000","required_sales_qty":null},"1280899":{"product_id":"1280899","recommended_sales_qty":"1.0000","required_sales_qty":null},"1280900":{"product_id":"1280900","recommended_sales_qty":"1.0000","required_sales_qty":null},"1289700":{"product_id":"1289700","recommended_sales_qty":"1.0000","required_sales_qty":null},"1289750":{"product_id":"1289750","recommended_sales_qty":"1.0000","required_sales_qty":null}}});
"""

假设字符串中的字典不超过一个,并且没有其他花括号,并且该字典采用与JSON兼容的格式(所有这些对于您的示例都是正确的,但我不知道它们是否对所有示例都是正确的您的真实数据),您可以使用简单的字符串操作将其提取出来:

prefix, openbrace, rest = s.partition('{')
j, closebrace, suffix = rest.rpartition('}')
if openbrace and closebrace:
    j = openbrace + j + closebrace
    d = json.loads(j)

如果这些假设不成立,那就会困难得多。

首先,您需要提取匹配的括号对。 像这样:

braces = 0
start = None
for i, c in enumerate(s):
    if c == '{':
        if not braces: start = I
        braces += 1
    elif c == '}':
        braces -= 1
        if not braces:
            yield s[start:I+1]

除非字符串中可以有括号,否则您需要跳过它们,并且需要以适用于JavaScript字符串或源语言实际上是任何形式的方式处理它,包括适当的反斜杠转义引用规则等。 。 而且,如果您的字符串可以包含HTML实体转义符(和引号),那么您也需要处理获胜。

然后,您必须解析每个字典。 例如,如果它们是任意JavaScript代码的字符串,则不能将其解析为JSON或Python dict,因为它们都不能够处理完美有效的JS对象,例如{abc: 1} (请注意,引号引起来),因此您需要编写自己的解析器。

而且,如果字典甚至不一定是文字,例如{abc: spam} ,其中spam是变量的名称,非平凡的表达式等,那么您将需要使用JS解释器,feed在适当的环境中,然后解释代码。

如果不使用正则表达式,则必须在看到第一个{同时对字符串进行迭代,然后将其后的所有内容保存到结束} (或str.partition使用str.partition )。

如果字典本身有一个字典,则还必须计算开括号,并且只有在找到相应的右括号时才停止保存。

如果存在无与伦比的寄生虫,那还是一个问题,要解决起来并不容易。 使用正则表达式更好地坚持解决方案: https : //stackoverflow.com/a/51861961/2648551

您可以使用堆栈查找匹配的花括号而无需使用正则表达式

import ast
def find_dicts(s):
    stack = []
    buffer=""
    for ch in s:
        if ch == "{":
            buffer += "{"
            stack.append(ch)
        elif ch == "}":
            stack.pop(-1)
            buffer += "}"
            if not stack:
              yield ast.literal_eval(buffer)
              buffer = ""   
        elif stack:
            buffer += ch

print(list(find_dicts("""Bla bla {"value": "1"} bla bla""")            ))

我认为这也应该处理嵌套的字典https://repl.it/repls/JuicyWoodenTranslation

以及字符串中的多个dict

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM