简体   繁体   English

使用键值连接两个csv文件

[英]join two csv files with key value

I have two csv files, I want to join them using a key value, the column of the city. 我有两个csv文件,我想使用键值(城市列)加入它们。

One csv file, d01.csv has this form, 一个csv文件,d01.csv有这种形式,

Barcelona, 19.5, 29.5
Tarragona, 20.4, 31.5 
Girona, 17.2, 32.5
Lleida, 16.5, 33.5 
Vic, 17.5, 31.4

The other one, d02.csv, has the next structure, 另一个,d02.csv,具有下一个结构,

City, Data, TMax, TMin
Barcelona, 20140916, 19.9, 28.5
Tarragona, 20140916, 21.4, 30.5  
Lleida, 20140916, 17.5, 32.5 
Tortosa, 20140916, 20.5, 30.4

I need a new csv file, with a column of cities which appear in the 2 csv files. 我需要一个新的csv文件,其中一列城市出现在2个csv文件中。

City, Tmin, Tmax, Date, Tmin1, Tmax1
Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
Tarragona, 20.4, 31.5, 20140916, 21.4, 30.5
Girona, 17.2, 32.5, 20140916, 17.5, 32.5
Lleida, 16.5, 33.5, 20140916, 20.5, 30.4

I tried to do that with 我试着这样做

join -j 2 -t ',' d01.csv d02.csv | awk -F "," '{print $1, $2, $3, $4, $5} > d03.csv

but it is not complete...how can I order the key value? 但它不完整......我怎样才能订购关键值?

Here's how to use join in bash: 以下是如何在bash中使用join:

{
  echo "City, Tmin, Tmax, Date, Tmin1, Tmax1"
  join -t, <(sort d01.csv) <(sed 1d d02.csv | sort)
} > d03.csv
cat d03.csv
City, Tmin, Tmax, Date, Tmin1, Tmax1
Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
Lleida, 16.5, 33.5 , 20140916, 17.5, 32.5 
Tarragona, 20.4, 31.5 , 20140916, 21.4, 30.5  

Note that join only outputs records where the key exists in both files. 请注意,join仅输出两个文件中密钥存在的记录。 To get all of them, specify that you want missing records from both files, specify the fields you want, and give a default value for the missing fields: 要获取所有这些文件,请指定您希望从两个文件中丢失记录,指定所需的字段,并为缺少的字段指定默认值:

join -t, -a1 -a2 -o 0,1.2,1.3,2.2,2.3,2.4 -e '?' <(sort d01.csv) <(sed 1d d02.csv | sort)
Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
Girona, 17.2, 32.5,?,?,?
Lleida, 16.5, 33.5 , 20140916, 17.5, 32.5 
Tarragona, 20.4, 31.5 , 20140916, 21.4, 30.5  
Tortosa,?,?, 20140916, 20.5, 30.4
Vic, 17.5, 31.4,?,?,?

I suggest the CSV Cruncher which takes CSV files as SQL tables and then allows SQL queries, resulting in another CSV file. 我建议使用CSV Cruncher将CSV文件作为SQL表,然后允许SQL查询,从而生成另一个CSV文件。

Example: 例:

crunch input.csv output.csv \
   "SELECT AVG(duration) AS durAvg FROM (SELECT * FROM indata ORDER BY duration LIMIT 2 OFFSET 6)"

The tool needs Java 5 or later. 该工具需要Java 5或更高版本。

Some of the advantages: 一些优点:

  • You really get CSV support, not just "let's assume the data is correct". 你真的得到了CSV支持,而不仅仅是“让我们假设数据是正确的”。
  • You can join on multiple keys. 您可以加入多个密钥。
  • Easier to use and understand than join -based solutions. 更容易使用和理解比join为基础的解决方案。
  • You can combine more than 2 CSV files. 您可以组合2个以上的CSV文件。
  • You can join by SQL expressions - the values don't have to be the same. 您可以通过SQL表达式加入 - 值不必相同。

Disclaimer: I wrote that tool. 免责声明:我写了这个工具。 Unknown project state - Google Code was closed and I didn't transfer it soon enough. 未知的项目状态 - 谷歌代码已关闭,我没有尽快转移它。 I might have a look at it if someone is insterested. 如果有人有兴趣,我可能会看看它。

This awk may do: 这个awk可能会这样做:

awk 'FNR==NR {a[$1]=$2FS$3FS$4;next} $1 in a {print $0,a[$1]}' OFS=", " d02,csv d01csv
Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
Tarragona, 20.4, 31.5 , 20140916, 21.4, 30.5
Lleida, 16.5, 33.5 , 20140916, 17.5, 32.5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM