简体   繁体   English

将Numpy数组插入并分类到Django建模数据库EAV模式中

[英]Insert and categorize a Numpy array into a Django modelled database EAV schema

I have a Pandas pivot table of the format: 我有一个格式的Pandas数据透视表:

income_category     age_category      income         age
High                Middle aged       123,564.235    23.456
Medium              Old               18,324.356     65.432

I have a category hierarchy with matching label s in a self-referencing table called dimension . 我在一个名为dimension的自引用表中具有一个带有匹配label s的类别层次结构。 Ie, 也就是说,

dimension_id       label             parent_dimension_id
1                  Age categories
2                  Young             1
3                  Middle aged       1
4                  Old               1

...and similarly for income

I'm really struggling to pick a row at a time and access cells in that row randomly. 我真的很难一次选择一行并随机访问该行中的单元格。

I have the parent category id dimension_id (in the code below it is already in cat_id_age ). 我有父类别ID dimension_id (在下面的代码中,它已经在cat_id_age )。 So I want to iterate through the Numpy array, getting the matching category dimension_id for that row, and insert it into a value table along with its corresponding value. 所以,我想通过numpy的数组迭代,得到匹配类别dimension_id该行,并将其插入到一个值表,其对应的值一起。 However I've no idea how to do this Pythonically or Djangonically. 但是我不知道如何用Python或Djangonally做到这一点。 (There are only a few categories so I think the Dictionary approach below for looking up dimension_id is best.) To my iterative mind the process is: (只有几个类别,因此我认为下面的“字典”方法查找dimension_id最好。)对我而言,迭代过程是:

# populate a Dictionary to find dimension_ids
age_dims = Dimension.objects.filter(parent_id=cat_id_age).values('label', 'id')

for row in Numpy_array:

    dim_id = Dimension.get(row.age_category)

    # Or is the Dict approach incorrect? I'm trying to do: SELECT dimension_id FROM dimension WHERE parent_dimension_id=cat_id_age AND label=row.age_category
    # Djagonically? dim = Dimension.objects.get(parent_id=cat_id_age, label=row.age_category)

    # Then insert categorized value, ie, INSERT INTO float_value (value, dimension_id) VALUES (row.age, dimension_id)
    float_val = FloatValue(value=row.age, dimension_id=dim_id)
    float_val.save()

...then repeat for income_category and income.

However I'm struggling with iterating like this - that may be my only problem but I've included the rest to communicate what I'm trying to do as I often seem a paradigm away Python (eg, sth like cursor.executemany("""insert values(?, ?, ?)""", map(tuple, numpy_arr[x:].tolist())) ?). 但是,我一直在努力进行这样的迭代-这可能是我唯一的问题,但是我包括了其余部分来交流我想做的事情,因为我经常看起来像Python的范例(例如,诸如cursor.executemany("""insert values(?, ?, ?)""", map(tuple, numpy_arr[x:].tolist())) ?)。

Any pointers really appreciated. 任何指针真的很感激。 (I'm using Django 1.7 and Python 3.4.) (我使用的是Django 1.7和Python 3.4。)

Anzel answered the iterating problem here - use the Pandas to_csv() function . Anzel在这里回答了迭代问题-使用Pandas to_csv()函数 My dictionary syntax was also wrong. 我的字典语法也是错误的。 My final solution was therefore: 因此,我的最终解决方案是:

# populate a Dictionary to find dimension_ids for category labels
parent_dimension_age = Dimension.objects.get(name='Age')
parent_dimension_income = Dimension.objects.get(name='Income')
dims_age = dict([ (d.name, d.id) for d in Dimension.objects.filter(parent_id=parent_dimension_age.id) ])
dims_income = dict([ (d.name, d.id) for d in Dimension.objects.filter(parent_id=parent_dimension_income.id) ])

# Retrieves a row at a time into a comma delimited string
for line in pandas_pivottable.to_csv(header=False, index=True, sep='\t').split('\n'):
    if line:
        # row[0] = income category, row[1] = age category, row[2] = age, row[3] = income
        row = line.split('\t')
        entity = Entity(name='data pivot row', dataset_id=dataset.id)
        entity.save()
        # dims_age.get(row[1]) gets the ID for the category whose name matches the contents of row[1]
        age_val = FloatValue(value=row[2], entity_id=entity.id, attribute_id=attrib_age.id, dimension_id=dims_age.get(row[1]))
        age_val.save()
        income_val = FloatValue(value=row[3], entity_id=entity.id, attribute_id=attrib_income.id, dimension_id=dims_income.get(row[0]))
        income_val.save()

For more on the Entity-Attribute-Value (EAV) schema see the Wikipedia page , (if you are considering it see the Django-EAV extension ). 有关Entity-Attribute-Value(EAV)模式的更多信息,请参见Wikipedia页面 (如果正在考虑,请参阅Django-EAV扩展名 )。 In the next iteration of this project however, I will be replacing it with postgresql's new JSONB type . 但是,在该项目的下一个迭代中,我将用postgresql的新JSONB type替换它。 This promises to make the data more legible and perform equally or better. 这有望使数据更清晰,性能相同或更好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM