[英]join two csv files with key value
I have two csv files, I want to join them using a key value, the column of the city. 我有两个csv文件,我想使用键值(城市列)加入它们。
One csv file, d01.csv has this form, 一个csv文件,d01.csv有这种形式,
Barcelona, 19.5, 29.5
Tarragona, 20.4, 31.5
Girona, 17.2, 32.5
Lleida, 16.5, 33.5
Vic, 17.5, 31.4
The other one, d02.csv, has the next structure, 另一个,d02.csv,具有下一个结构,
City, Data, TMax, TMin
Barcelona, 20140916, 19.9, 28.5
Tarragona, 20140916, 21.4, 30.5
Lleida, 20140916, 17.5, 32.5
Tortosa, 20140916, 20.5, 30.4
I need a new csv file, with a column of cities which appear in the 2 csv files. 我需要一个新的csv文件,其中一列城市出现在2个csv文件中。
City, Tmin, Tmax, Date, Tmin1, Tmax1
Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
Tarragona, 20.4, 31.5, 20140916, 21.4, 30.5
Girona, 17.2, 32.5, 20140916, 17.5, 32.5
Lleida, 16.5, 33.5, 20140916, 20.5, 30.4
I tried to do that with 我试着这样做
join -j 2 -t ',' d01.csv d02.csv | awk -F "," '{print $1, $2, $3, $4, $5} > d03.csv
but it is not complete...how can I order the key value? 但它不完整......我怎样才能订购关键值?
Here's how to use join in bash: 以下是如何在bash中使用join:
{
echo "City, Tmin, Tmax, Date, Tmin1, Tmax1"
join -t, <(sort d01.csv) <(sed 1d d02.csv | sort)
} > d03.csv
cat d03.csv
City, Tmin, Tmax, Date, Tmin1, Tmax1
Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
Lleida, 16.5, 33.5 , 20140916, 17.5, 32.5
Tarragona, 20.4, 31.5 , 20140916, 21.4, 30.5
Note that join only outputs records where the key exists in both files. 请注意,join仅输出两个文件中密钥存在的记录。 To get all of them, specify that you want missing records from both files, specify the fields you want, and give a default value for the missing fields: 要获取所有这些文件,请指定您希望从两个文件中丢失记录,指定所需的字段,并为缺少的字段指定默认值:
join -t, -a1 -a2 -o 0,1.2,1.3,2.2,2.3,2.4 -e '?' <(sort d01.csv) <(sed 1d d02.csv | sort)
Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
Girona, 17.2, 32.5,?,?,?
Lleida, 16.5, 33.5 , 20140916, 17.5, 32.5
Tarragona, 20.4, 31.5 , 20140916, 21.4, 30.5
Tortosa,?,?, 20140916, 20.5, 30.4
Vic, 17.5, 31.4,?,?,?
I suggest the CSV Cruncher which takes CSV files as SQL tables and then allows SQL queries, resulting in another CSV file. 我建议使用CSV Cruncher将CSV文件作为SQL表,然后允许SQL查询,从而生成另一个CSV文件。
Example: 例:
crunch input.csv output.csv \
"SELECT AVG(duration) AS durAvg FROM (SELECT * FROM indata ORDER BY duration LIMIT 2 OFFSET 6)"
The tool needs Java 5 or later. 该工具需要Java 5或更高版本。
Some of the advantages: 一些优点:
join
-based solutions. 更容易使用和理解比join
为基础的解决方案。 Disclaimer: I wrote that tool. 免责声明:我写了这个工具。 Unknown project state - Google Code was closed and I didn't transfer it soon enough. 未知的项目状态 - 谷歌代码已关闭,我没有尽快转移它。 I might have a look at it if someone is insterested. 如果有人有兴趣,我可能会看看它。
This awk
may do: 这个awk
可能会这样做:
awk 'FNR==NR {a[$1]=$2FS$3FS$4;next} $1 in a {print $0,a[$1]}' OFS=", " d02,csv d01csv
Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
Tarragona, 20.4, 31.5 , 20140916, 21.4, 30.5
Lleida, 16.5, 33.5 , 20140916, 17.5, 32.5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.