简体   繁体   English

为数据集生成随机JSON结构排列

[英]Generate random JSON structure permutations for a data set

I want to generate many different permutations of JSON structures as a representation of the same data set, preferably without having to hard code the implementation. 我想生成许多不同的JSON结构排列作为同一数据集的表示,最好不必硬编码实现。 For example, given the following JSON: 例如,给定以下JSON:

{"name": "smith", "occupation": "agent", "enemy": "humanity", "nemesis": "neo"}`

Many different permutations should be produced, such as: 应该产生许多不同的排列,例如:

  • change in name : {"name":"smith"}- > {"last_name":"smith"} 更改名称: {"name":"smith"}- > {"last_name":"smith"}
  • change in order: {"name":"...","occupation":"..."} -> {"occupation":"...", "name":"..."} 按顺序更改: {"name":"...","occupation":"..."} -> {"occupation":"...", "name":"..."}
  • change in arrangement: {"name":"...","occupation":"..."} -> "smith":{"occupation":"..."} 安排改变: {"name":"...","occupation":"..."} -> "smith":{"occupation":"..."}
  • change in template: {"name":"...","occupation":"..."} -> "status": 200, "data":{"name":"...","occupation":"..."} 更改模板: {"name":"...","occupation":"..."} -> "status": 200, "data":{"name":"...","occupation":"..."}
  • etc. 等等

Currently, the implementation is as follows: 目前,实施情况如下:

I am using itertools.permutations and OrderedDict() to range through the possible key and respective value combinations as well as the order in which they are returned. 我使用itertools.permutations和OrderedDict()来调整可能的键和相应的值组合以及它们返回的顺序。

key_permutations = SchemaLike(...).permutate()

all_simulacrums = []
for key_permutation in key_permutations:
   simulacrums = OrderedDict(key_permutation)
   all_simulacrums.append(simulacrums)
for x in itertools.permutations(all_simulacrums.items()):
    test_data = json.dumps(OrderedDict(p))
    print(test_data)
    assert json.loads(test_data) == data, 'Oops! {} != {}'.format(test_data, data)

My problem occurs when I try to implement the permutations of arrangement and template. 当我尝试实现排列和模板的排列时,我的问题就出现了。 I don't know how best to implement this functionality, any suggestions? 我不知道如何最好地实现这个功能,任何建议?

For ordering, just use ordered dicts: 如需订购,只需使用有序的dicts:

>>> data = OrderedDict(foo='bar', bacon='eggs', bar='foo', eggs='bacon')
>>> for p in itertools.permutations(data.items()):
...     test_data = json.dumps(OrderedDict(p))
...     print(test_data)
...     assert json.loads(test_data) == data, 'Oops! {} != {}'.format(test_data, data)

{"foo": "bar", "bacon": "eggs", "bar": "foo", "eggs": "bacon"}
{"foo": "bar", "bacon": "eggs", "eggs": "bacon", "bar": "foo"}
{"foo": "bar", "bar": "foo", "bacon": "eggs", "eggs": "bacon"}
{"foo": "bar", "bar": "foo", "eggs": "bacon", "bacon": "eggs"}
{"foo": "bar", "eggs": "bacon", "bacon": "eggs", "bar": "foo"}
{"foo": "bar", "eggs": "bacon", "bar": "foo", "bacon": "eggs"}
{"bacon": "eggs", "foo": "bar", "bar": "foo", "eggs": "bacon"}
{"bacon": "eggs", "foo": "bar", "eggs": "bacon", "bar": "foo"}
{"bacon": "eggs", "bar": "foo", "foo": "bar", "eggs": "bacon"}
{"bacon": "eggs", "bar": "foo", "eggs": "bacon", "foo": "bar"}
{"bacon": "eggs", "eggs": "bacon", "foo": "bar", "bar": "foo"}
{"bacon": "eggs", "eggs": "bacon", "bar": "foo", "foo": "bar"}
{"bar": "foo", "foo": "bar", "bacon": "eggs", "eggs": "bacon"}
{"bar": "foo", "foo": "bar", "eggs": "bacon", "bacon": "eggs"}
{"bar": "foo", "bacon": "eggs", "foo": "bar", "eggs": "bacon"}
{"bar": "foo", "bacon": "eggs", "eggs": "bacon", "foo": "bar"}
{"bar": "foo", "eggs": "bacon", "foo": "bar", "bacon": "eggs"}
{"bar": "foo", "eggs": "bacon", "bacon": "eggs", "foo": "bar"}
{"eggs": "bacon", "foo": "bar", "bacon": "eggs", "bar": "foo"}
{"eggs": "bacon", "foo": "bar", "bar": "foo", "bacon": "eggs"}
{"eggs": "bacon", "bacon": "eggs", "foo": "bar", "bar": "foo"}
{"eggs": "bacon", "bacon": "eggs", "bar": "foo", "foo": "bar"}
{"eggs": "bacon", "bar": "foo", "foo": "bar", "bacon": "eggs"}
{"eggs": "bacon", "bar": "foo", "bacon": "eggs", "foo": "bar"}

The same principle can be applied for key/value permutations: 相同的原则可以应用于键/值排列:

>>> for p in itertools.permutations(data.keys()):
...:     test_data = json.dumps(OrderedDict(zip(p, data.values())))
...:     print(test_data)
...:     
{"foo": "bar", "bacon": "eggs", "bar": "foo", "eggs": "bacon"}
{"foo": "bar", "bacon": "eggs", "eggs": "foo", "bar": "bacon"}
{"foo": "bar", "bar": "eggs", "bacon": "foo", "eggs": "bacon"}
{"foo": "bar", "bar": "eggs", "eggs": "foo", "bacon": "bacon"}
{"foo": "bar", "eggs": "eggs", "bacon": "foo", "bar": "bacon"}
{"foo": "bar", "eggs": "eggs", "bar": "foo", "bacon": "bacon"}
{"bacon": "bar", "foo": "eggs", "bar": "foo", "eggs": "bacon"}
{"bacon": "bar", "foo": "eggs", "eggs": "foo", "bar": "bacon"}
{"bacon": "bar", "bar": "eggs", "foo": "foo", "eggs": "bacon"}
{"bacon": "bar", "bar": "eggs", "eggs": "foo", "foo": "bacon"}
{"bacon": "bar", "eggs": "eggs", "foo": "foo", "bar": "bacon"}
{"bacon": "bar", "eggs": "eggs", "bar": "foo", "foo": "bacon"}
{"bar": "bar", "foo": "eggs", "bacon": "foo", "eggs": "bacon"}
{"bar": "bar", "foo": "eggs", "eggs": "foo", "bacon": "bacon"}
{"bar": "bar", "bacon": "eggs", "foo": "foo", "eggs": "bacon"}
{"bar": "bar", "bacon": "eggs", "eggs": "foo", "foo": "bacon"}
{"bar": "bar", "eggs": "eggs", "foo": "foo", "bacon": "bacon"}
{"bar": "bar", "eggs": "eggs", "bacon": "foo", "foo": "bacon"}
{"eggs": "bar", "foo": "eggs", "bacon": "foo", "bar": "bacon"}
{"eggs": "bar", "foo": "eggs", "bar": "foo", "bacon": "bacon"}
{"eggs": "bar", "bacon": "eggs", "foo": "foo", "bar": "bacon"}
{"eggs": "bar", "bacon": "eggs", "bar": "foo", "foo": "bacon"}
{"eggs": "bar", "bar": "eggs", "foo": "foo", "bacon": "bacon"}
{"eggs": "bar", "bar": "eggs", "bacon": "foo", "foo": "bacon"}

And so on... You can just use a predefined set of keys/values if you don't need all combinations. 等等......如果您不需要所有组合,则可以使用一组预定义的键/值。 You can also use a for loop with random.choice to flip a coin in order to skip some combinations or use random.shuffle at the risk of repeating combinations. 您还可以使用带有random.choicefor循环来翻转硬币以跳过某些组合或使用random.shuffle冒着重复组合的风险。

For the template thing I guess you must create a list (or a list of lists if you want nested structures) of different templates and then iterate over it in order to create your data. 对于模板,我猜你必须创建一个不同模板的列表(或列表列表,如果你想要嵌套结构),然后迭代它以创建你的数据。 In order to give a better suggestion we need a more constrained specification of what you want. 为了给出更好的建议,我们需要对您想要的更加有限的规范。

Note that there are several libraries that generate test data in Python: 请注意,有几个库在Python中生成测试数据:

>>> from faker import Faker
>>> faker = Faker()
>>> faker.credit_card_full().strip().split('\n')
['VISA 13 digit', 'Jerry Gutierrez', '4885274641760 04/24', 'CVC: 583']

Faker has several schemas and it is easy to create your own custom fake data providers. Faker有几个模式,很容易创建自己的自定义虚假数据提供程序。

Since the shuffle for the dict order has already been answered, I'll skip that. 由于dict命令的shuffle已经被回答,我将跳过它。

I'll add to this answer as new things come to mind. 当新的事物浮现在脑海中时,我会补充这个答案。

from random import randint
from collections import OrderedDict

#Randomly shuffles the key-value pairs of a dictionary
def random_dict_items(input_dict):
    items = input_dict.items()
    new_dict = OrderedDict()
    for i in items:
        rand = randint(0, 1)
        if rand == 0:
            new_dict[i[0]] = i[1]
        else:
            new_dict[i[1]] = i[0]
    return new_dict

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM