如何将数据从表插入到 postgres 列

Question

I have to create a dummy data.我必须创建一个虚拟数据。 I already have >30000 features in 'buildings' table, and I created 1 new column called 'roof_material'.我已经在“建筑物”表中拥有 >30000 个特征，并且创建了 1 个名为“roof_material”的新列。 I also have another table called 'materials' which contains 8 rows, like this:我还有另一个名为“材料”的表，其中包含 8 行，如下所示：

|id|  material
+--+-----------
|1 |  tiles
|2 |  metal
|3 |  concrete
|4 |  slate
|5 |  steel
|6 |  clay
|7 |  wood shake
|8 |  asphalt

I want to populate the buildings.roof_materials with values from "materials" table randomly.我想用“材料”表中的值随机填充 building.roof_materials。

So in the end, every row in that 30000 features will have roof_materials data.所以最后，这 30000 个特征中的每一行都会有roof_materials 数据。

Can anyone help me?谁能帮我？

Answer 1

Assuming that the column roof_material is a foreign key to the material table, you can simply do this:假设列roof_material材料是material表的外键，您可以简单地这样做：

update buildings
  set roof_material = (random() * 7 + 1)::int;

That essentially hard codes the possible primary key values of the material table - which is good enough for a one-off update.这实质上是对material表的可能主键值进行硬编码——这对于一次性更新来说已经足够了。

If you want to make that dynamic depending on the actual values in the material table you can use something like this:如果您想根据material表中的实际值进行动态调整，您可以使用以下内容：

with idlist as (
  select array_agg(id) mat_ids
  from material 
)
update building
  set roof_material = mat_ids[(random() * (cardinality(mat_ids) - 1) + 1)::int]
from idlist;

First the common table expression idlist collects all existing IDs from the material table into an array and the update statement then randomly picks elements from that array when updating the building table.首先，公用表表达式idlist将material表中的所有现有 ID 收集到一个数组中，然后更新语句在更新建筑表时从该数组中随机选取元素。

Answer 2

This can be tricky, because sometimes Postgres optimizations get in the way.这可能很棘手，因为有时 Postgres 优化会阻碍。 One method uses a lateral join (or correlated subquery):一种方法使用横向连接（或相关子查询）：

select gs.x, m.*
from generate_series(1, 100) gs(x) cross join lateral
     (select m.*
      from materials m
      where gs.x is not null
      order by random()
     ) m

The correlation clause (the where ) is important because otherwise Postgres decides that it can run the subquery only once.相关子句（ where ）很重要，因为否则 Postgres 决定它只能运行一次子查询。

If you want an equal distribution of the values, then you can randomly enumerate each list and join them using modulo arithmetic:如果您想要值的平均分布，那么您可以随机枚举每个列表并使用模算术连接它们：

with t as (
      select gs.x, row_number() over (order by random()) - 1 as seqnum
      from generate_series(1, 100) gs(x)
     ),
     m as (
      select m.*, row_number() over (order by random()) - 1 as seqnum,
             count(*) over () as num_materials
      from materials m
     )
select t.x, m.id, m.material
from t join
     m
     on t.seqnum % m.num_materials = m.seqnum
order by t.x;

Here is a db<>fiddle. 这是一个 db<>fiddle。

如何将数据从表插入到 postgres 列

问题描述

2 个解决方案

解决方案1
1 2019-09-20 09:29:07

解决方案2
0 2019-09-20 11:09:37

如何将数据从表插入到 postgres 列

问题描述

2 个解决方案

解决方案1 1 2019-09-20 09:29:07

解决方案2 0 2019-09-20 11:09:37

解决方案1
1 2019-09-20 09:29:07

解决方案2
0 2019-09-20 11:09:37