使用标准格式搜索网址

Question

I am trying to figure out how to search through AWS .xml metadata files to check whether or not a particular imagery tile from Landsat or Sentinel meets my requirements. 我试图弄清楚如何搜索AWS .xml元数据文件，以检查Landsat或Sentinel中的特定图像块是否符合我的要求。

the files for these data products follow a standard url format: 这些数据产品的文件遵循标准的url格式：

http://sentinel-s2-l1c.s3.amazonaws.com/tiles/10/S/DG/2015/12/7/0/metadata.xml http://sentinel-s2-l1c.s3.amazonaws.com/tiles/10/S/DG/2015/12/7/0/metadata.xml

the format includes references to the Military grid reference system and the date the that the image was captured, what i'd like to do is to search through available URLs for a given tile so any available .xml url 格式包括对军事网格参考系统的引用和图像被捕获的日期，我想要做的是搜索给定图块的可用URL，以便任何可用的.xml网址

http://sentinel-s2-l1c.s3.amazonaws.com/tiles/10/S/DG/2015/../../0/metadata.xml http://sentinel-s2-l1c.s3.amazonaws.com/tiles/10/S/DG/2015/../../0/metadata.xml

so in the above example 10 is the utm zone, S is the latitude, and DG is the specific tile, so I would like a way to find and read all the metadata.xml files for a given tile in a given year. 所以在上面的示例中，10是utm区域，S是纬度，DG是特定区块，所以我想找到并读取给定年份中给定区块的所有metadata.xml文件。

I really have no idea how to go about this, but I have some experience with python and Java any help or resources to look at would be greatly appreciated 我真的不知道如何解决这个问题，但我对python和Java有一些经验，任何帮助或资源都会非常感激

Answer 1

Consider a two-fold procedure. 考虑一个双重程序。 One that checks URLs and if valid, downloads each XML and then deletes any request error XMLS. 检查URL并且如果有效，则下载每个XML，然后删除任何请求错误XMLS。 Use python's built-in os module for the filesystem work. 使用python的内置os模块进行文件系统工作。

Note : below script saves files in an existing subfolder called AWS relative to running .py script. 注意：下面的脚本将文件保存在相对于运行.py脚本的名为AWS的现有子文件夹中。 Deletion loop only removes folders in this subfolder: 删除循环仅删除此子文件夹中的文件夹：

import os
import requests as rq

baseurl = 'http://sentinel-s2-l1c.s3.amazonaws.com/tiles/10/S/DG/2015/{}/{}/0/metadata.xml'

# ITERATE THROUGH ALL MONTH / DAY COMBINATIONS
for i in [(m, d) for m in range(1,13) for d in range(1,31)]:    

    if request.status_code == 200:
        rqpage = rq.get(baseurl.format(i[0], i[1]))
        rqcontent = rqpage.content

        with open('AWS/{}-{}-{}_metadata.xml'.format('2015', i[0], i[1]), 'wb') as f:
            f.write(rqcontent)        

# REMOVE BY SIZE ERROR RETURNED XML (WHICH ARE STILL VALID URLS)
for d, subdir, files in os.walk('AWS'):
    for f in files:
        if os.stat(os.path.join(d, f)).st_size < 400:
            os.remove(os.path.join(d, f))

Output 产量

使用标准格式搜索网址

问题描述

1 个解决方案

解决方案1
0 2017-01-16 22:48:41

使用标准格式搜索网址

问题描述

1 个解决方案

解决方案1 0 2017-01-16 22:48:41

解决方案1
0 2017-01-16 22:48:41