為數據集生成隨機JSON結構排列

Question

我想生成許多不同的JSON結構排列作為同一數據集的表示，最好不必硬編碼實現。 例如，給定以下JSON：

{"name": "smith", "occupation": "agent", "enemy": "humanity", "nemesis": "neo"}`

應該產生許多不同的排列，例如：

更改名稱： {"name":"smith"}- > {"last_name":"smith"}
按順序更改： {"name":"...","occupation":"..."} -> {"occupation":"...", "name":"..."}
安排改變： {"name":"...","occupation":"..."} -> "smith":{"occupation":"..."}
更改模板： {"name":"...","occupation":"..."} -> "status": 200, "data":{"name":"...","occupation":"..."}
等等

目前，實施情況如下：

我使用itertools.permutations和OrderedDict（）來調整可能的鍵和相應的值組合以及它們返回的順序。

key_permutations = SchemaLike(...).permutate()

all_simulacrums = []
for key_permutation in key_permutations:
   simulacrums = OrderedDict(key_permutation)
   all_simulacrums.append(simulacrums)
for x in itertools.permutations(all_simulacrums.items()):
    test_data = json.dumps(OrderedDict(p))
    print(test_data)
    assert json.loads(test_data) == data, 'Oops! {} != {}'.format(test_data, data)

當我嘗試實現排列和模板的排列時，我的問題就出現了。 我不知道如何最好地實現這個功能，任何建議？

Answer 1

如需訂購，只需使用有序的dicts：

>>> data = OrderedDict(foo='bar', bacon='eggs', bar='foo', eggs='bacon')
>>> for p in itertools.permutations(data.items()):
...     test_data = json.dumps(OrderedDict(p))
...     print(test_data)
...     assert json.loads(test_data) == data, 'Oops! {} != {}'.format(test_data, data)

{"foo": "bar", "bacon": "eggs", "bar": "foo", "eggs": "bacon"}
{"foo": "bar", "bacon": "eggs", "eggs": "bacon", "bar": "foo"}
{"foo": "bar", "bar": "foo", "bacon": "eggs", "eggs": "bacon"}
{"foo": "bar", "bar": "foo", "eggs": "bacon", "bacon": "eggs"}
{"foo": "bar", "eggs": "bacon", "bacon": "eggs", "bar": "foo"}
{"foo": "bar", "eggs": "bacon", "bar": "foo", "bacon": "eggs"}
{"bacon": "eggs", "foo": "bar", "bar": "foo", "eggs": "bacon"}
{"bacon": "eggs", "foo": "bar", "eggs": "bacon", "bar": "foo"}
{"bacon": "eggs", "bar": "foo", "foo": "bar", "eggs": "bacon"}
{"bacon": "eggs", "bar": "foo", "eggs": "bacon", "foo": "bar"}
{"bacon": "eggs", "eggs": "bacon", "foo": "bar", "bar": "foo"}
{"bacon": "eggs", "eggs": "bacon", "bar": "foo", "foo": "bar"}
{"bar": "foo", "foo": "bar", "bacon": "eggs", "eggs": "bacon"}
{"bar": "foo", "foo": "bar", "eggs": "bacon", "bacon": "eggs"}
{"bar": "foo", "bacon": "eggs", "foo": "bar", "eggs": "bacon"}
{"bar": "foo", "bacon": "eggs", "eggs": "bacon", "foo": "bar"}
{"bar": "foo", "eggs": "bacon", "foo": "bar", "bacon": "eggs"}
{"bar": "foo", "eggs": "bacon", "bacon": "eggs", "foo": "bar"}
{"eggs": "bacon", "foo": "bar", "bacon": "eggs", "bar": "foo"}
{"eggs": "bacon", "foo": "bar", "bar": "foo", "bacon": "eggs"}
{"eggs": "bacon", "bacon": "eggs", "foo": "bar", "bar": "foo"}
{"eggs": "bacon", "bacon": "eggs", "bar": "foo", "foo": "bar"}
{"eggs": "bacon", "bar": "foo", "foo": "bar", "bacon": "eggs"}
{"eggs": "bacon", "bar": "foo", "bacon": "eggs", "foo": "bar"}

相同的原則可以應用於鍵/值排列：

>>> for p in itertools.permutations(data.keys()):
...:     test_data = json.dumps(OrderedDict(zip(p, data.values())))
...:     print(test_data)
...:     
{"foo": "bar", "bacon": "eggs", "bar": "foo", "eggs": "bacon"}
{"foo": "bar", "bacon": "eggs", "eggs": "foo", "bar": "bacon"}
{"foo": "bar", "bar": "eggs", "bacon": "foo", "eggs": "bacon"}
{"foo": "bar", "bar": "eggs", "eggs": "foo", "bacon": "bacon"}
{"foo": "bar", "eggs": "eggs", "bacon": "foo", "bar": "bacon"}
{"foo": "bar", "eggs": "eggs", "bar": "foo", "bacon": "bacon"}
{"bacon": "bar", "foo": "eggs", "bar": "foo", "eggs": "bacon"}
{"bacon": "bar", "foo": "eggs", "eggs": "foo", "bar": "bacon"}
{"bacon": "bar", "bar": "eggs", "foo": "foo", "eggs": "bacon"}
{"bacon": "bar", "bar": "eggs", "eggs": "foo", "foo": "bacon"}
{"bacon": "bar", "eggs": "eggs", "foo": "foo", "bar": "bacon"}
{"bacon": "bar", "eggs": "eggs", "bar": "foo", "foo": "bacon"}
{"bar": "bar", "foo": "eggs", "bacon": "foo", "eggs": "bacon"}
{"bar": "bar", "foo": "eggs", "eggs": "foo", "bacon": "bacon"}
{"bar": "bar", "bacon": "eggs", "foo": "foo", "eggs": "bacon"}
{"bar": "bar", "bacon": "eggs", "eggs": "foo", "foo": "bacon"}
{"bar": "bar", "eggs": "eggs", "foo": "foo", "bacon": "bacon"}
{"bar": "bar", "eggs": "eggs", "bacon": "foo", "foo": "bacon"}
{"eggs": "bar", "foo": "eggs", "bacon": "foo", "bar": "bacon"}
{"eggs": "bar", "foo": "eggs", "bar": "foo", "bacon": "bacon"}
{"eggs": "bar", "bacon": "eggs", "foo": "foo", "bar": "bacon"}
{"eggs": "bar", "bacon": "eggs", "bar": "foo", "foo": "bacon"}
{"eggs": "bar", "bar": "eggs", "foo": "foo", "bacon": "bacon"}
{"eggs": "bar", "bar": "eggs", "bacon": "foo", "foo": "bacon"}

等等......如果您不需要所有組合，則可以使用一組預定義的鍵/值。 您還可以使用帶有random.choice的for循環來翻轉硬幣以跳過某些組合或使用random.shuffle冒着重復組合的風險。

對於模板，我猜你必須創建一個不同模板的列表（或列表列表，如果你想要嵌套結構），然后迭代它以創建你的數據。 為了給出更好的建議，我們需要對您想要的更加有限的規范。

請注意，有幾個庫在Python中生成測試數據：

>>> from faker import Faker
>>> faker = Faker()
>>> faker.credit_card_full().strip().split('\n')
['VISA 13 digit', 'Jerry Gutierrez', '4885274641760 04/24', 'CVC: 583']

Faker有幾個模式，很容易創建自己的自定義虛假數據提供程序。

Answer 2

由於dict命令的shuffle已經被回答，我將跳過它。

當新的事物浮現在腦海中時，我會補充這個答案。

from random import randint
from collections import OrderedDict

#Randomly shuffles the key-value pairs of a dictionary
def random_dict_items(input_dict):
    items = input_dict.items()
    new_dict = OrderedDict()
    for i in items:
        rand = randint(0, 1)
        if rand == 0:
            new_dict[i[0]] = i[1]
        else:
            new_dict[i[1]] = i[0]
    return new_dict

為數據集生成隨機JSON結構排列

問題描述

2 個解決方案

解決方案1
5 2017-10-03 14:58:20

解決方案2
4 2017-10-03 15:22:49

為數據集生成隨機JSON結構排列

問題描述

2 個解決方案

解決方案1 5 2017-10-03 14:58:20

解決方案2 4 2017-10-03 15:22:49

解決方案1
5 2017-10-03 14:58:20

解決方案2
4 2017-10-03 15:22:49