简体   繁体   English

将来自两个 csv 文件和 output 的数据合并到类别中

[英]Combine data from two csv files and output into categories

I have two csv files: one with files/paths that have been altered and one with files/paths that have been deleted.我有两个 csv 文件:一个包含已更改的文件/路径,另一个包含已删除的文件/路径。 I am trying to combine the two to make it easier to see what has been added, altered, or deleted.我试图将两者结合起来,以便更容易查看已添加、更改或删除的内容。 For example in the csv with altered lines:例如在 csv 中更改了行:

 Hello_World.py,/Users/name/DropBox/Other Test Essay.docx,/Users/name/DropBox/Other Test/XXX NEW Project.docx,/Users/name/DropBox/Other Test/XXX Picture.jpg,/Users/name/DropBox/Other Test/XXX

and in the deleted lines csv:在已删除的 csv 行中:

 Test.txt,/Users/name/DropBox/Other Test/XXX Picture.jpg,/Users/name/DropBox/Other Test Project.docx,/Users/name/DropBox/Other Test/XXX/Path 3 test Essay.docx,/Users/name/DropBox/Project/Other Test/XXX/Path 3 test

As you can see if something was moved to a new folder (or altered), it repeats in the deleted and altered files.如您所见,如果某些内容已移动到新文件夹(或更改),它会在已删除和更改的文件中重复出现。

I want the output to look like this我希望 output 看起来像这样

Altered,Picture.jpg,new_path, old_path Altered,Essay.docx,new_path,old_path Deleted,Test.txt,path,n/a Deleted,Project.docx,path,n/a Added,Hello_World.py,path,n/a Added,NEW Project.docx,path,n/a

I've tried a bunch of things but here is what has gotten me the closest.我已经尝试了很多东西,但这是让我最接近的东西。 The conditional for determining if it is 'altered' works but for 'added' and 'deleted' it outputs all of the entries as both added and deleted (if that makes sense).确定它是否被“更改”的条件有效,但对于“添加”和“删除”,它输出所有条目作为添加和删除(如果有意义的话)。

mc = open('Major Changes {}.csv'.format(directory),'w')
print('Major Changes',file=mc)
fieldnames = ['Alt/Del', 'File Name','Path', 'Original Path']
writer = csv.DictWriter(mc, fieldnames=fieldnames)
writer.writeheader()

with open('Altered {}.csv'.format(directory),'r') as a1, open('Deleted {}.csv'.format(directory),'r') as d1:
    reader1 = csv.reader(a1, delimiter=',')
    reader2 = csv.reader(d1, delimiter=',')
    next(reader1)
    next(reader1)
    next(reader2)
    next(reader2)
    file1 = set(a1.read().splitlines())
    file2 = set(d1.read().splitlines())
    
for line in file1:
    x1,y1 = line.split(',')
    for line in file2:
        x2,y2 = line.split(',')
        if x1 in x2:
            writer.writerow({'Alt/Del': 'Altered','File Name': x1,'Path':y1,'Original Path':y2})
        else:
            writer.writerow({'Alt/Del': 'Added','File Name': x1,'Path':y1,'Original Path':'N/A'})
            writer.writerow({'Alt/Del':'Deleted', 'File Name': x2, 'Path':y2, 'Original Path':'N/A'})

It outputs to a new csv.它输出到一个新的 csv。 Please let me know if I need to clarify anything and thanks,.如果我需要澄清任何事情,请告诉我,谢谢。 Also sorry if the formatting is off, I wasn't sure how to separate my data/outputs.如果格式关闭也很抱歉,我不确定如何分隔我的数据/输出。

You can't use else for this because you have to compare x1 with all x2 in all rows in file2 to confirm it file is only in first file.您不能为此使用else ,因为您必须将x1file2中所有行中的所有x2进行比较,以确认它的文件仅在第一个文件中。

You should get only filenames and use您应该只获取文件名并使用

  • set(f1) - set(f2) to get filenames which are only in first file, set(f1) - set(f2)获取仅在第一个文件中的文件名,
  • set(f2) - set(f1) to get filenames which are only in second file, set(f2) - set(f1)获取仅在第二个文件中的文件名,
  • set(f1) & set(f2) to get filenames which are in both files set(f1) & set(f2)获取两个文件中的文件名

BTW: I use io.String() only to simualate file but you should use normal open() for this顺便说一句:我只使用io.String()来模拟文件,但你应该为此使用普通的open()

import csv
import io

data1 = '''

Hello_World.py,/Users/name/DropBox/Other Test
Essay.docx,/Users/name/DropBox/Other Test/XXX NEW
Project.docx,/Users/name/DropBox/Other Test/XXX
Picture.jpg,/Users/name/DropBox/Other Test/XXX'''

data2 = '''

Test.txt,/Users/name/DropBox/Other Test/XXX 
Picture.jpg,/Users/name/DropBox/Other Test
Project.docx,/Users/name/DropBox/Other Test/XXX/Path 3 test
Essay.docx,/Users/name/DropBox/Project/Other Test/XXX/Path 3 test'''

#with open(Altered {}.csv'.format(directory)) as fh:
with io.StringIO(data1) as fh:
    reader = csv.reader(fh, delimiter=',')
    next(reader)
    next(reader)
    file1 = list(reader)

#with open(Deleted {}.csv'.format(directory)) as fh:
with io.StringIO(data2) as fh:
    reader = csv.reader(fh, delimiter=',')
    next(reader)
    next(reader)
    file2 = list(reader)

#print(file1)
set_filenames1 = set([row[0] for row in file1])
print('set1:', set_filenames1)

#print(file2)
set_filenames2 = set([row[0] for row in file2])
print('set2:', set_filenames2)

only_first  = set_filenames1 - set_filenames2
only_second = set_filenames2 - set_filenames1
both = set_filenames2 & set_filenames1

print(' first:', only_first)
print('second:', only_second)
print('  both:', both)

Result:结果:

set1: {'Picture.jpg', 'Hello_World.py', 'Essay.docx', 'Project.docx'}
set2: {'Project.docx', 'Picture.jpg', 'Essay.docx', 'Test.txt'}
 first: {'Hello_World.py'}
second: {'Test.txt'}
  both: {'Picture.jpg', 'Essay.docx', 'Project.docx'}

And when you will have it then you can try to save in new files.当您拥有它时,您可以尝试保存在新文件中。 But it could be easier to get now other information if you keep data in dictionary with `filename as key.但是,如果您将数据保存在字典中并使用 `filename 作为键,那么现在获取其他信息可能会更容易。

dict1 = {row[0]:row for row in file1}
dict2 = {row[0]:row for row in file2}

for name in both:
    print('Altered |', name, '|', dict1[name][1], '|', dict2[name][1])

for name in only_first:
    print('Added   |', name, '|', dict1[name][1], '| n/a')
    
for name in only_second:
    print('Deleted |', name, '|', dict2[name][1], '| n/a')
    

Result:结果:

Altered | Essay.docx | /Users/name/DropBox/Other Test/XXX NEW | /Users/name/DropBox/Project/Other Test/XXX/Path 3 test
Altered | Project.docx | /Users/name/DropBox/Other Test/XXX | /Users/name/DropBox/Other Test/XXX/Path 3 test
Altered | Picture.jpg | /Users/name/DropBox/Other Test/XXX | /Users/name/DropBox/Other Test
Added   | Hello_World.py | /Users/name/DropBox/Other Test | n/a
Deleted | Test.txt | /Users/name/DropBox/Other Test/XXX  | n/a

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM