处理和打印CSV文件中的数据

Question

I have two CSV files. 我有两个CSV文件。 The first column in both files is a timestamp but all the other columns contain different data. 两个文件中的第一列都是时间戳，但其他所有列均包含不同的数据。 Some of these timestamps overlap but occur in different rows. 其中一些时间戳重叠，但发生在不同的行中。
I want to create a new file which contains all the overlapping timestamps, along with the relevant data from both files. 我想创建一个包含所有重叠时间戳以及两个文件中的相关数据的新文件。

For example: 例如：

File 1: 文件1：

['1', 'John', 'Doe'] 
['2', 'Jane', 'Deer']
['3', 'Horror', 'Movie']

File 2: 档案2：

['2', 'Mac']
['3', 'bro']
['4', 'come']
['1', '@mebro']

Output File: 输出文件：

['1', 'John', 'Doe', '@mebro']
['2', 'Jane', 'Deer', 'Mac']
['3', 'Horror', 'Movie', 'bro']

This is the code I have so far: 这是我到目前为止的代码：

Outfile = []

for row in file2:
Outfile.append(tuple(row))

if len(file1) >= len(file2):
    for n in xrange(1,len(file2)):
        if file1[0][n] == file2[0][:]:
            Outfile.append(file1[1:8][n])

if len(file2) >= len(file1):
    for n in xrange(1,len(file1)):
        if file1[0][n] == file2[0][:]:
            Outfile.append(file1[1:8][n])

resultFile = open("resultFile.csv","wb")
wr = csv.writer(Outfile, dialect= "excel")
wr.writerows(Outfile)

Answer 1

Use pandas library. 使用熊猫库。

import pandas as pd

df1 = pd.read_csv("path to file 1")
df2 = pd.read_csv("path to file 2")

result = merge(df1, df2, on='First column', sort=True)
result.to_csv("path to result file")

merge will join the two dataframes with specified column. merge将使用指定的列连接两个数据框。

More Information 更多信息

Answer 2

The answer given by mds is much more efficient and I only give this as supplementary information because there are a number of fundamental issues with the way you are trying to use list indices. mds给出的答案要有效得多，我仅将其作为补充信息给出，因为您尝试使用列表索引的方式存在许多基本问题。 This code will give the output list you are looking for and might illustrate better how they work (with the addition of 'example' in file2 to show how it would add additional elements). 此代码将提供您要查找的输出列表，并可能更好地说明它们的工作方式（在file2中添加了“ example”以显示它将如何添加其他元素）。

list1 = [['1', 'John', 'Doe'], 
        ['2', 'Jane', 'Deer'],
        ['3', 'Horror', 'Movie']]

list2 = [['2', 'Mac', 'example'],
        ['3', 'bro'],
        ['4', 'come'],
        ['1', '@mebro']]

for x in range(len(list1)):
    print "List1 timestamp for consideration: " + str(list1[x][0])
    for y in range(len(list2)):
        print "Compared to list2 timestamp: " + str(list2[y][0])
        if list1[x][0] == list2[y][0]:
            print "Match"
            for z in range(1,len(list2[y])):
                list1[x].append(list2[y][z])

Your printed output from this is: 您的打印输出是：

List1 timestamp for consideration: 1
Compared to list2 timestamp: 2
Compared to list2 timestamp: 3
Compared to list2 timestamp: 4
Compared to list2 timestamp: 1
Match
List1 timestamp for consideration: 2
Compared to list2 timestamp: 2
Match
Compared to list2 timestamp: 3
Compared to list2 timestamp: 4
Compared to list2 timestamp: 1
List1 timestamp for consideration: 3
Compared to list2 timestamp: 2
Compared to list2 timestamp: 3
Match
Compared to list2 timestamp: 4
Compared to list2 timestamp: 1

With list1 then looking like: 然后使用list1看起来像：

list 1 = [['1', 'John', 'Doe', '@mebro'],
 ['2', 'Jane', 'Deer', 'Mac', 'example'],
 ['3', 'Horror', 'Movie', 'bro']]

处理和打印CSV文件中的数据

问题描述

2 个解决方案

解决方案1
0 已采纳 2016-01-18 16:34:53

解决方案2
0 2016-01-18 17:15:25

处理和打印CSV文件中的数据

问题描述

2 个解决方案

解决方案1 0 已采纳 2016-01-18 16:34:53

解决方案2 0 2016-01-18 17:15:25

解决方案1
0 已采纳 2016-01-18 16:34:53

解决方案2
0 2016-01-18 17:15:25