[英]How to insert data from a table to postgres column
I have to create a dummy data.我必须创建一个虚拟数据。 I already have >30000 features in 'buildings' table, and I created 1 new column called 'roof_material'.
我已经在“建筑物”表中拥有 >30000 个特征,并且创建了 1 个名为“roof_material”的新列。 I also have another table called 'materials' which contains 8 rows, like this:
我还有另一个名为“材料”的表,其中包含 8 行,如下所示:
|id| material
+--+-----------
|1 | tiles
|2 | metal
|3 | concrete
|4 | slate
|5 | steel
|6 | clay
|7 | wood shake
|8 | asphalt
I want to populate the buildings.roof_materials with values from "materials" table randomly.我想用“材料”表中的值随机填充 building.roof_materials。
So in the end, every row in that 30000 features will have roof_materials data.所以最后,这 30000 个特征中的每一行都会有roof_materials 数据。
Can anyone help me?谁能帮我?
Assuming that the column roof_material
is a foreign key to the material
table, you can simply do this:假设列
roof_material
材料是material
表的外键,您可以简单地这样做:
update buildings
set roof_material = (random() * 7 + 1)::int;
That essentially hard codes the possible primary key values of the material
table - which is good enough for a one-off update.这实质上是对
material
表的可能主键值进行硬编码——这对于一次性更新来说已经足够了。
If you want to make that dynamic depending on the actual values in the material
table you can use something like this:如果您想根据
material
表中的实际值进行动态调整,您可以使用以下内容:
with idlist as (
select array_agg(id) mat_ids
from material
)
update building
set roof_material = mat_ids[(random() * (cardinality(mat_ids) - 1) + 1)::int]
from idlist;
First the common table expression idlist
collects all existing IDs from the material
table into an array and the update statement then randomly picks elements from that array when updating the building table.首先,公用表表达式
idlist
将material
表中的所有现有 ID 收集到一个数组中,然后更新语句在更新建筑表时从该数组中随机选取元素。
This can be tricky, because sometimes Postgres optimizations get in the way.这可能很棘手,因为有时 Postgres 优化会阻碍。 One method uses a lateral join (or correlated subquery):
一种方法使用横向连接(或相关子查询):
select gs.x, m.*
from generate_series(1, 100) gs(x) cross join lateral
(select m.*
from materials m
where gs.x is not null
order by random()
) m
The correlation clause (the where
) is important because otherwise Postgres decides that it can run the subquery only once.相关子句(
where
)很重要,因为否则 Postgres 决定它只能运行一次子查询。
If you want an equal distribution of the values, then you can randomly enumerate each list and join them using modulo arithmetic:如果您想要值的平均分布,那么您可以随机枚举每个列表并使用模算术连接它们:
with t as (
select gs.x, row_number() over (order by random()) - 1 as seqnum
from generate_series(1, 100) gs(x)
),
m as (
select m.*, row_number() over (order by random()) - 1 as seqnum,
count(*) over () as num_materials
from materials m
)
select t.x, m.id, m.material
from t join
m
on t.seqnum % m.num_materials = m.seqnum
order by t.x;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.