简体   繁体   中英

Handling and printing data in CSV files

I have two CSV files. The first column in both files is a timestamp but all the other columns contain different data. Some of these timestamps overlap but occur in different rows.
I want to create a new file which contains all the overlapping timestamps, along with the relevant data from both files.

For example:

File 1:

['1', 'John', 'Doe'] 
['2', 'Jane', 'Deer']
['3', 'Horror', 'Movie']

File 2:

['2', 'Mac']
['3', 'bro']
['4', 'come']
['1', '@mebro']

Output File:

['1', 'John', 'Doe', '@mebro']
['2', 'Jane', 'Deer', 'Mac']
['3', 'Horror', 'Movie', 'bro']

This is the code I have so far:

Outfile = []

for row in file2:
Outfile.append(tuple(row))

if len(file1) >= len(file2):
    for n in xrange(1,len(file2)):
        if file1[0][n] == file2[0][:]:
            Outfile.append(file1[1:8][n])

if len(file2) >= len(file1):
    for n in xrange(1,len(file1)):
        if file1[0][n] == file2[0][:]:
            Outfile.append(file1[1:8][n])

resultFile = open("resultFile.csv","wb")
wr = csv.writer(Outfile, dialect= "excel")
wr.writerows(Outfile)

Use pandas library.

import pandas as pd

df1 = pd.read_csv("path to file 1")
df2 = pd.read_csv("path to file 2")

result = merge(df1, df2, on='First column', sort=True)
result.to_csv("path to result file")

merge will join the two dataframes with specified column.

More Information

The answer given by mds is much more efficient and I only give this as supplementary information because there are a number of fundamental issues with the way you are trying to use list indices. This code will give the output list you are looking for and might illustrate better how they work (with the addition of 'example' in file2 to show how it would add additional elements).

list1 = [['1', 'John', 'Doe'], 
        ['2', 'Jane', 'Deer'],
        ['3', 'Horror', 'Movie']]

list2 = [['2', 'Mac', 'example'],
        ['3', 'bro'],
        ['4', 'come'],
        ['1', '@mebro']]

for x in range(len(list1)):
    print "List1 timestamp for consideration: " + str(list1[x][0])
    for y in range(len(list2)):
        print "Compared to list2 timestamp: " + str(list2[y][0])
        if list1[x][0] == list2[y][0]:
            print "Match"
            for z in range(1,len(list2[y])):
                list1[x].append(list2[y][z])

Your printed output from this is:

List1 timestamp for consideration: 1
Compared to list2 timestamp: 2
Compared to list2 timestamp: 3
Compared to list2 timestamp: 4
Compared to list2 timestamp: 1
Match
List1 timestamp for consideration: 2
Compared to list2 timestamp: 2
Match
Compared to list2 timestamp: 3
Compared to list2 timestamp: 4
Compared to list2 timestamp: 1
List1 timestamp for consideration: 3
Compared to list2 timestamp: 2
Compared to list2 timestamp: 3
Match
Compared to list2 timestamp: 4
Compared to list2 timestamp: 1

With list1 then looking like:

list 1 = [['1', 'John', 'Doe', '@mebro'],
 ['2', 'Jane', 'Deer', 'Mac', 'example'],
 ['3', 'Horror', 'Movie', 'bro']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM