Python比较XML.Etree以及目录列表中的文件和文件夹名称的差异

Question

I'm using Python to compare my flickr photos to my local harddrive photo directories. 我正在使用Python将flickr照片与本地硬盘照片目录进行比较。

In order to do this, I'm using OAuth in Python and getting an etree listing of each folder/album I have in flickr. 为了做到这一点，我在Python中使用OAuth，并获得了flickr中每个文件夹/相册的etree列表。 The folder/album contents on flickr 'should' match my local copy directory. flickr上的文件夹/相册内容“应”与我的本地副本目录匹配。

I'd like my script to tell me when there are items not in my photo listing on my local drive and flickr (and vice versa). 我希望我的脚本告诉我本地驱动器和flickr上的照片清单中是否没有项目（反之亦然）。

The 'title' field of the flickr photos should be the same as the filename on Linux and directory names on Linux will/should match the album names on flickr. flickr照片的“标题”字段应与Linux上的文件名相同，Linux上的目录名称将/应与flickr上的相册名称匹配。 That is currently how I have it set up. 目前，这就是我的设置方式。

I'm wondering what would be the best and most efficient way to compare these item lists in Python (etree node items vs os.listdir() items)? 我想知道在Python中比较这些项目列表（enode节点项目与os.listdir()项目）的最佳和最有效的方法是什么？

I'd rather not get into using sort() in bash to sort any piped output to filenames unless necessary. 除非必要，否则我不希望在bash中使用sort()将所有管道输出排序为文件名。 I'd like to keep everything in Python if possible as I'm just learning it. 我想尽可能地将所有内容保留在Python中，因为我只是在学习它。

I could use os.listdir() and compare that to the XML.Etree nodes returned to flickr, but what would be the best approach to do this comparison? 我可以使用os.listdir()并将其与返回flickr的XML.Etree节点进行比较，但是进行这种比较的最佳方法是什么？

Keep in mind that the lists may not be the same and may not be sorted when comparing items from flickr and Linux. 请记住，比较flickr和Linux中的项目时，列表可能并不相同，也可能没有排序。

I have the following snippet of code written to get results from flickr: 我编写了以下代码片段以从flickr获取结果：

...oauth code above...
sets = flickr.photosets.getList(user_id=user_id)
print ("Total sets: " + sets.find('photosets').attrib['total'])
all_sets = sets.find('photosets').findall('photoset')

for each_set in all_sets:
   for node in each_set.findall('title'):
      print ("photoset: " + each_set.get('id') + ", " + node.text + ", photos: ", each_set.get('photos'))
      all_photos = flickr.photosets.getPhotos(user_id=user_id, photoset_id=each_set.get('id'))
      photos = all_photos.find('photoset')
      for photo in photos:
         print (photo.get('title'))

An example of the output from the above code would be: 上面代码的输出示例为：

photoset: 72157659163323894, Birthday Party - Nov 21, 2015, photos:  131
...
2015:11:21-16:11:14-IMG_20151121_161114372
2015:11:21-16:11:10-IMG_20151121_161109739
2015:11:21-16:10:36-IMG_20151121_161035497
2015:11:21-15:47:14-IMG_20151121_154713671
2015:11:21-15:43:17-IMG_20151121_154317180
2015:11:21-15:43:15-IMG_20151121_154315539
2015:11:21-15:23:42-IMG_20151121_152342348
2015:11:21-15:23:11-IMG_20151121_152311411
...
2015:11:21-16:21:19-DSC_0603
2015:11:21-16:21:13-DSC_0602
2015:11:21-16:21:11-DSC_0601
2015:11:21-16:21:09-DSC_0600
2015:11:21-16:21:07-DSC_0599
2015:11:21-16:21:05-DSC_0598
2015:11:21-16:20:13-DSC_0597
2015:11:21-16:20:09-DSC_0596
2015:11:21-16:19:59-DSC_0595
2015:11:21-16:19:56-DSC_0594
2015:11:21-16:19:55-DSC_0593
...

The API for getPhotos is here: https://www.flickr.com/services/api/flickr.photosets.getPhotos.htm which shows some of the example xtree/XML output. getPhotos的API在这里： https ://www.flickr.com/services/api/flickr.photosets.getPhotos.htm，其中显示了一些示例xtree / XML输出。

Etree API: https://docs.python.org/2/library/xml.etree.elementtree.html Etree API： https ： //docs.python.org/2/library/xml.etree.elementtree.html

Answer 1

To check if the file from your flickr exist on your hd: 要检查您的hd上是否存在flickr中的文件：

not_on_hd = []
for file in flickr_photos:
    if os.path.exists("path/to/"+file):
        continue
    else:
        not_on_hd.append(file)
print(not_on_hd)

To do it the other way around I'd use a simple if file_on_drive is in flickr_photos , and append the ones that return false to a list, just like above. 要做到这一点，我将if file_on_drive is in flickr_photos使用一个简单的if file_on_drive is in flickr_photos ，并将返回false的附加到列表中，就像上面一样。

not_on_flickr = []
for file_on_drive in files_on_drive:
    if file_on_drive in flickr_photos:
        continue
    else:
        not_on_flickr.append(file_on_drive)
print(not_on_flickr)

Since you asked for efficiency: pop() any files that have been found in the first run from the list, making the second run shorter. 自从您要求提高效率以来：pop（）从列表中第一次运行中找到的任何文件，从而使第二次运行更短。

not_on_hd = []
for i,file in enumerate(flickr_photos):
    if os.path.exists("/path/to/"+file):
        continue
    else:
        not_on_hd.append(file)
        flickr_photos.pop(i)
print(not_on_hd)

Here's some docs on what I did up there: 以下是一些有关我在这里所做的工作的文档：
enumerate() - python3 docs enumerate() -python3 文档
is in - Python3 Docs (Section 6.10.2) (And the difference between is and == here ) is in -Python3 Docs （第6.10.2节）（而is和== 在这里的区别）

Answer 2

Birds eye view: 鸟瞰：

Create a set (datatype set !) of full path names from the XML. 从XML创建完整路径名的集合（数据类型set ！）。
Create another set of full path names from your local file system. 从本地文件系统创建另一组完整路径名。
Use set operations to get the paths missing on either side. 使用set操作可获取任一侧缺少的路径。

Python比较XML.Etree以及目录列表中的文件和文件夹名称的差异

问题描述

2 个解决方案

解决方案1
1 已采纳 2015-11-24 13:44:00

解决方案2
0 2015-11-24 15:38:12

Python比较XML.Etree以及目录列表中的文件和文件夹名称的差异

问题描述

2 个解决方案

解决方案1 1 已采纳 2015-11-24 13:44:00

解决方案2 0 2015-11-24 15:38:12

解决方案1
1 已采纳 2015-11-24 13:44:00

解决方案2
0 2015-11-24 15:38:12