如何將 awk 結果變量分配給數組，是否可以在循環中的另一個 awk 中使用 awk

Question

我已經開始學習 bash 並完全堅持這項任務。 我有一個逗號分隔的 csv 文件，其中包含以下記錄：

id,location_id,organization_id,service_id,name,title,email,department
1,1,,,Name surname,department1 department2 department3,,
2,1,,,name Surname,department1,,
3,2,,,Name Surname,"department1 department2, department3",, e.t.c.

我需要這樣格式化：名字和姓氏必須以大寫字母開頭

添加一個 email 記錄，其中包含姓名的第一個字母和小寫的完整姓氏
使用舊 csv 中的記錄和更正的字段創建一個新的 csv。

我使用 awk 在記錄上拆分 csv （因為某些字段包含在引號"department1 department2, department3"之間帶有逗號的字段）。

#!/bin/bash
input="$HOME/test.csv"

exec 0<$input

while read line; do

awk -v FPAT='"[^"]*"|[^,]*' '{ 
  ...
}' $input)

done

在 awk {...} （每條記錄的 NF=8）中，我嘗試使用某些字段值（$1 $2 $3 $4 $5 $6 $7 $8）：

#it doesn't work 

IFS=' ' read -a name_surname<<<$5 # Field 5 match to *name* in heading of csv

# Could I use inner awk with field values of outer awk ($5) to separate the field value of outer awk $5 ? 
# as an example:                                  
# $5="${awk '{${1^}${2^}}' $5}"
# where ${1^} and ${2^} fields of inner awk
  
name_surname[0]=${name_surname[0]^}
name_surname[1]=${name_surname[1]^}
  
$5="${name_surname[0]}' '${name_surname[1]}"

email_name=${name_surname[0]:0:1}
email_surname=${name_surname[1]}
domain='@domain'

$7="${email_name,}${email_surname,,}$domain" # match to field 7 *email* in heading of csv

如何將字段值（$1 $2 $3 $4 $5 $6 $7 $8）添加到數組並為每個for 循環迭代調用function join以將記錄添加到新的 csv 文件？

function join { local IFS="$1"; shift; echo "$*"; }
result=$(join , ${arr[@]})
echo $result >> new.csv

Answer 1

這可能是您正在嘗試做的事情（使用 gawk 進行 FPAT 就像您已經在做的那樣）但沒有更具代表性的樣本輸入和預期的 output 這是一個猜測：

$ cat tst.sh
#!/usr/bin/env bash

awk '
BEGIN {
    OFS = ","
    FPAT = "[^"OFS"]*|\"[^\"]*\""
}
NR > 1 {
    n = split($5,name,/\s*/)
    $7 = tolower(substr(name[1],1,1) name[n]) "@example.com"
    print
}
' "${@:--}"

$ ./tst.sh test.csv
1,1,,,Name surname,department1 department2 department3,nsurname@example.com,
2,1,,,name Surname,department1,nsurname@example.com,
3,2,,,Name Surname,"department1 department2, department3",nsurname@example.com,

I put the awk script inside a shell script since that looks like what you want, obviously you don't need to do that you could just save the awk script in a file and invoke it with awk -f .

Answer 2

Ed Morton的完全有效的答案。

If it may be will be helpful for someone, I added one more checking condition: if in CSV file more than one email address with the same name - index number is added to email local part and output is sent to file

#!/usr/bin/env bash
input="$HOME/test.csv"
exec 0<$input

awk '
BEGIN {
  OFS = ","
  FPAT = "[^"OFS"]*|\"[^\"]*\""
}

(NR == 1) {print} #header of csv
(NR > 1) {

  if (length($0) > 1) { #exclude empty lines
    count = 0
    n = split($5,name,/\s*/)
    email_local_part = tolower(substr(name[1],1,1) name[n])
   
    #array stores emails from csv file
    a[i++] = email_local_part
    
    #find amount of occurrences of the same email address
    for (el in a) {
      ret=match(a[el], email_local_part)
  
      if (ret == 1) { count++ }
    } 

    #add number of occurrence to email address
    if (count == 1) { $7 = email_local_part "@abc.com" }
    else { --count; $7 = email_local_part count "@abc.com" }

    print 
  }
} 
' "${@:--}" > new.csv

如何將 awk 結果變量分配給數組，是否可以在循環中的另一個 awk 中使用 awk

問題描述

2 個解決方案

解決方案1
2 已采納 2021-01-08 23:46:59

解決方案2
0 2021-01-10 15:44:21

如何將 awk 結果變量分配給數組，是否可以在循環中的另一個 awk 中使用 awk

問題描述

2 個解決方案

解決方案1 2 已采納 2021-01-08 23:46:59

解決方案2 0 2021-01-10 15:44:21

解決方案1
2 已采納 2021-01-08 23:46:59

解決方案2
0 2021-01-10 15:44:21