简体   繁体   中英

Join two csv files

csvfile1

status,longitude,latitude,timestamp    
ok,10.12,17.45,14569003    
ok,11.34,16.78,14569000

csvfile2

weather,timestamp,latitude1,longitude1,latitude2,longitude2
rainy,14569003,17.45,10.12,17.50,11.25    
sunny,14569000,13.76,12.44,16.78,11.34

expected output

status,weather,longitude,latitude,timestamp    
ok,rainy,10.12,17.45,14569003    
ok,sunny,11.34,16.78,14569000    

I would like to combine the columns longitude,latitude and timestamp of both the files.

There are two longitudes and two latitudes in csvfile2. So i want to compare if it matches any one of the longitude-latitude pairs along with the timestamp.

And the column name order is also different in both the files.

Any help would be appreciated.

Thank you.

You can use it.

import pandas as pd

first = pd.read_csv('csvfile1.csv')
second = pd.read_csv('csvfile2.csv')

merged = pd.merge(first, second, how='left', on='what you want(it can be label or a list)')
merged.to_csv('merged.csv', index=False)

for more details, You can see these link1 . link2 both are helpful.

awk solution:

join_csv.awk script:

#!/bin/awk -f
BEGIN {
    FS=OFS=",";   # field separator
    print "status,weather,longitude,latitude,timestamp"  # header line
}
NR==FNR && NR>1 {          # processing the first file
    a[$4]=$1 FS $2 FS $3   # accumulating the needed values (status, longitude, latitude) 
}
FNR>1 {                    # processing the second file
    if ($2 in a) {         # if `timestamp` matches                                                                                                                                             
        split(a[$2],data,FS);  # extracting items for further comparison
        if ((data[2]==$4 || data[2]==$6) && (data[3]==$3 || data[3]==$5)) {
            print data[1],$1,data[2],data[3],$2
        }
    }
}

Usage :

awk -f join_csv.awk file1 file2

The output:

status,weather,longitude,latitude,timestamp
ok,rainy,10.12,17.45,14569003
ok,sunny,11.34,16.78,14569000

Hope this answer will help you:

import csv
file1 = open("csvfile1.csv", "r")
file2 = open("csvfile2.csv", "r")

file1_dict = csv.DictReader(file1)
file2_dict = csv.DictReader(file2)

new_file = open("new_file.csv", "w")
csv_writer = csv.writer(new_file)
csv_writer.writerow(["status", "weather", "longitude", "latitude", "timestamp"])
for f1_row, f2_row in zip(file1_dict, file2_dict):
    f1_row, f2_row = dict(f1_row), dict(f2_row) # In python2 no need to convert to dict
    if f1_row["timestamp"] == f2_row["timestamp"]: #Here write the condition to check your latitude and longitude also.
        csv_writer.writerow([f1_row["status"], f2_row["weather"], f1_row["longitude"],  f1_row["latitude"],  f1_row["timestamp"]])

file1.close()
file2.close()
new_file.close()

Got output:

status,weather,longitude,latitude,timestamp
ok,rainy,10.12,17.45,14569003
ok,sunny,11.34,16.78,14569000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM