简体   繁体   English

使用bash,awk或sed将CSV文件模板化为SQL文件

[英]Using bash, awk or sed to templatize a CSV file into a SQL file

I will have a CSV file (say, ids.csv ) that I need to ETL into a SQL script (say, update_products.sql ). 我将需要将ETL转换为SQL脚本(例如update_products.sql )的CSV文件(例如ids.csv )。 The CSV will be headerless and will consist of comma-delimited numbers (product IDs in a database), for instance: CSV将是无标题的,并且将由逗号分隔的数字(数据库中的产品ID)组成,例如:

29294848,29294849,29294850,29294851,29294853,29294857,29294858,29294860,29294861,29294863,29294887,29294888,
29294889,29294890,29294891,29294892,29294895,29294897,29294898,29294899,29294901,29294903,29294912,29294916

Starting with a SQL "template" file ( template.sql ) that looks something like this: 用SQL的“模板”文件(开始template.sql ),看起来这样

UPDATE products SET quantity = 0 WHERE id = %ID%;

I'm looking for a way via bash , awk , sed (or any other type of shell scripting tool), to templatize %IDS% with the values in the CSV, hence turning the generated SQL into something like: 我正在寻找一种通过bashawksed (或任何其他类型的Shell脚本工具)的方法,以CSV中的值对%IDS%进行模板化,从而将生成的SQL转换为类似以下内容的方法:

UPDATE products SET quantity = 0 WHERE id = 29294848;
UPDATE products SET quantity = 0 WHERE id = 29294849;
UPDATE products SET quantity = 0 WHERE id = 29294850;
... etc, for all the IDs in the CSV...

Super flexible here: 这里超级灵活:

  • Don't care which tool gets the job done ( awk , sed , bash, whatever...as long as I can run it from the command line) 不在乎哪个工具可以完成工作( awksed ,bash等等……只要我可以从命令行运行它)
  • Don't necessarily NEED a template file ( template.sql ) to start with, perhaps the solution can just "inject" this template into the script as an argument 不一定需要开始使用模板文件( template.sql ),也许解决方案可以将该模板作为参数“注入”到脚本中
  • Ideally it would read the input CSV but this is not a hard requirement, if the solution requires me pasting the contents of the CSV file into the script as an argument, I'm OK with that, but not thrilled... 理想情况下, 它将读取输入的CSV,但这不是硬性要求,如果该解决方案要求我将CSV文件的内容作为参数粘贴到脚本中,那么我可以接受,但不要激动...
  • Ideally it would generate an actual SQL file ( update_products.sql ) for me, but if we're limited to console output thats OK to (just not preferred) 理想情况下,会产生一个实际的SQL文件( update_products.sql对我来说),但如果我们仅限于控制台输出这就是确定 (只是没有优先)

Any ideas how I might be able to accomplish this? 有什么想法我可能能够做到这一点吗?

I'd probably start with 我可能会开始

$: sed "s/ *= %ID%/ IN ( $(echo $(<ids.csv) ) )/" template.sql > update_products.sql

but if it's a lot of id's I'm not sure what your limits are, and I honestly don't remember whether that's an ANSI standard structure... 但是如果有很多id,我不确定您的限制是什么,老实说,我不记得这是否是ANSI标准结构...

SO ... 所以 ...

$: while IFS=, read -a ids
> do for id in ${ids[@]}
>    do echo "UPDATE products SET quantity = 0 WHERE id = $id;"
>    done
> done < ids.csv > update_products.sql
$: cat update_products.sql
UPDATE products SET quantity = 0 WHERE id = 29294848;
UPDATE products SET quantity = 0 WHERE id = 29294849;
UPDATE products SET quantity = 0 WHERE id = 29294850;
UPDATE products SET quantity = 0 WHERE id = 29294851;
UPDATE products SET quantity = 0 WHERE id = 29294853;
UPDATE products SET quantity = 0 WHERE id = 29294857;
UPDATE products SET quantity = 0 WHERE id = 29294858;
UPDATE products SET quantity = 0 WHERE id = 29294860;
UPDATE products SET quantity = 0 WHERE id = 29294861;
UPDATE products SET quantity = 0 WHERE id = 29294863;
UPDATE products SET quantity = 0 WHERE id = 29294887;
UPDATE products SET quantity = 0 WHERE id = 29294888;
UPDATE products SET quantity = 0 WHERE id = 29294889;
UPDATE products SET quantity = 0 WHERE id = 29294890;
UPDATE products SET quantity = 0 WHERE id = 29294891;
UPDATE products SET quantity = 0 WHERE id = 29294892;
UPDATE products SET quantity = 0 WHERE id = 29294895;
UPDATE products SET quantity = 0 WHERE id = 29294897;
UPDATE products SET quantity = 0 WHERE id = 29294898;
UPDATE products SET quantity = 0 WHERE id = 29294899;
UPDATE products SET quantity = 0 WHERE id = 29294901;
UPDATE products SET quantity = 0 WHERE id = 29294903;
UPDATE products SET quantity = 0 WHERE id = 29294912;
UPDATE products SET quantity = 0 WHERE id = 29294916;

不需要使用%ID%-ids.txt只需要像这样以SQL为前缀,将输出写入product_updates.sql输出文件即可:

awk -F, '{printf "%s (%s)\n", "UPDATE products SET quantity = 0 WHERE id IN ", $0}' ids.txt > product_updates.sql

I propose to be safe rather than sorry. 我建议保持安全而不是后悔。

May be deemed pedantic, but working with business database is serious matter. 可能被认为是书呆子,但是使用业务数据库是很重要的事情。

So here it is based on @Paul Hodges's answer 所以这里是基于@Paul Hodges的回答

#!/usr/bin/env bash

{
  # Use the prepared statements `zeroproduct` 
  # to protect against SQL injections
  printf 'PREPARE zeroproduct FROM '\''%s'\'';\n' \
    'UPDATE products SET quantity = 0 WHERE id = ?'

  # Work inside a transaction, so if something goes wrong,
  # like the sql file is incomplete, it can be rolled-back.
  printf 'START TRANSACTION;\n'

  while IFS=, read -r -a ids; do
    for id in "${ids[@]}"; do
      # Set the value of the @id argument in SQL
      # And execute the SQL statement with the @id argument
      # that will replace the '?'
      printf 'SET @id='\''%8d'\''; EXECUTE zeroproduct USING @id;\n' \
        "$((id))" # Ensure id is an integer
    done
  done <ids.csv

  # Now commit all these changes since we are finally here
  printf 'COMMIT;\n'

  # Deallocate the prepared statement once we are done
  printf 'DEALLOCATE PREPARE zeroproduct;\n'

} >update_products.sql

# Good to have if this is transmitted remotely
sha512sum update_products.sql >update_products.sql.sha512sum

# can later check with:
sha512sum -c update_products.sql.sha512sum

From the provided sample csv, here is the content of update_products.sql : 在提供的示例csv中,这是update_products.sql的内容:

PREPARE zeroproduct FROM 'UPDATE products SET quantity = 0 WHERE id = ?';
START TRANSACTION;
SET @id='29294848'; EXECUTE zeroproduct USING @id;
SET @id='29294849'; EXECUTE zeroproduct USING @id;
SET @id='29294850'; EXECUTE zeroproduct USING @id;
SET @id='29294851'; EXECUTE zeroproduct USING @id;
SET @id='29294853'; EXECUTE zeroproduct USING @id;
SET @id='29294857'; EXECUTE zeroproduct USING @id;
SET @id='29294858'; EXECUTE zeroproduct USING @id;
SET @id='29294860'; EXECUTE zeroproduct USING @id;
SET @id='29294861'; EXECUTE zeroproduct USING @id;
SET @id='29294863'; EXECUTE zeroproduct USING @id;
SET @id='29294887'; EXECUTE zeroproduct USING @id;
SET @id='29294888'; EXECUTE zeroproduct USING @id;
SET @id='29294889'; EXECUTE zeroproduct USING @id;
SET @id='29294890'; EXECUTE zeroproduct USING @id;
SET @id='29294891'; EXECUTE zeroproduct USING @id;
SET @id='29294892'; EXECUTE zeroproduct USING @id;
SET @id='29294895'; EXECUTE zeroproduct USING @id;
SET @id='29294897'; EXECUTE zeroproduct USING @id;
SET @id='29294898'; EXECUTE zeroproduct USING @id;
SET @id='29294899'; EXECUTE zeroproduct USING @id;
SET @id='29294901'; EXECUTE zeroproduct USING @id;
SET @id='29294903'; EXECUTE zeroproduct USING @id;
SET @id='29294912'; EXECUTE zeroproduct USING @id;
SET @id='29294916'; EXECUTE zeroproduct USING @id;
COMMIT;
DEALLOCATE PREPARE zeroproduct;

In addition to the answer by @suspectus which provides a nice use of printf to output each line wanted, a slightly more procedural use of awk incorporating a for loop over the fields would be: @suspectus的答案除了可以很好地使用printf来输出所需的每一行外,还可以在awk过程上使用更多的方法,并在字段中使用for循环:

awk -F, '{
    for (i=1;i<=NF;i++)
        print "UPDATE products SET quantity = 0 WHERE id = " $i ";"
}' file.csv

Where the single rule simply loops over each of the comma-separated fields using string-concatenation to form the desired output. 单个规则使用字符串连接简单地遍历每个逗号分隔的字段以形成所需的输出。 In detail the awk command: 详细的awk命令:

  • awk -F, sets the field-separator ( FS ) equal to a comma to split the input, awk -F,字段分隔符FS )设置为等于逗号以分隔输入,
  • for (i=1;i<=NF;i++) simply loops over each field, and for (i=1;i<=NF;i++)仅遍历每个字段,并且
  • print "UPDATE products SET quantity = 0 WHERE id = " $i ";" outputs the wanted text incorporating the field within using string-concatenation. 使用字符串连接输出包含该字段的所需文本。

Example Use/Output 使用/输出示例

With your data in file.csv (presumed to be a single line, but it really doesn't matter) your output would be: 将数据保存在file.csv (假定为单行,但这并不重要),您的输出将是:

$ awk -F, '{
>     for (i=1;i<=NF;i++)
>         print "UPDATE products SET quantity = 0 WHERE id = " $i ";"
> }' file.csv
UPDATE products SET quantity = 0 WHERE id = 29294848;
UPDATE products SET quantity = 0 WHERE id = 29294849;
UPDATE products SET quantity = 0 WHERE id = 29294850;
UPDATE products SET quantity = 0 WHERE id = 29294851;
UPDATE products SET quantity = 0 WHERE id = 29294853;
UPDATE products SET quantity = 0 WHERE id = 29294857;
UPDATE products SET quantity = 0 WHERE id = 29294858;
UPDATE products SET quantity = 0 WHERE id = 29294860;
UPDATE products SET quantity = 0 WHERE id = 29294861;
UPDATE products SET quantity = 0 WHERE id = 29294863;
UPDATE products SET quantity = 0 WHERE id = 29294887;
UPDATE products SET quantity = 0 WHERE id = 29294888;
UPDATE products SET quantity = 0 WHERE id = 29294889;
UPDATE products SET quantity = 0 WHERE id = 29294890;
UPDATE products SET quantity = 0 WHERE id = 29294891;
UPDATE products SET quantity = 0 WHERE id = 29294892;
UPDATE products SET quantity = 0 WHERE id = 29294895;
UPDATE products SET quantity = 0 WHERE id = 29294897;
UPDATE products SET quantity = 0 WHERE id = 29294898;
UPDATE products SET quantity = 0 WHERE id = 29294899;
UPDATE products SET quantity = 0 WHERE id = 29294901;
UPDATE products SET quantity = 0 WHERE id = 29294903;
UPDATE products SET quantity = 0 WHERE id = 29294912;
UPDATE products SET quantity = 0 WHERE id = 29294916;

Look things over and let me know if you have further questions. 仔细检查一下,如果您还有其他问题,请告诉我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM