简体   繁体   English

读取 Azure Data Lake Store 中文件的元数据

[英]Read Meta Data of files inside Azure Data Lake Store

Need to READ META DATA of files stored in Azure Data Lake Store.需要读取存储在 Azure Data Lake Store 中的文件的元数据。

File may be of format JPEG, EXCEL or TIFF文件可以是 JPEG、EXCEL 或 TIFF 格式

Please advise, really looking for suggestions.请指教,真的在寻找建议。 I am using Microsoft Azure Data Lake Store and using USQL.我正在使用 Microsoft Azure Data Lake Store 和 USQL。

At the moment that is not supported.目前不支持。 It seems to be on the backlog according to the feedback site根据反馈站点,它似乎在积压中

You might be able to write a custom extractor as suggested in the link:您可以按照链接中的建议编写自定义提取器:

In case it is available, like EXIF in JPEG - extract some of the properties from the content using a custom extractor.如果可用,例如 JPEG 中的 EXIF - 使用自定义提取器从内容中提取一些属性。

According to this blogpost they have done it for image property extraction, see the repo .根据这篇博文,他们已经完成了图像属性提取,请参阅repo It can be a guide on how to implement this for your scenario's.它可以作为如何为您的场景实现这一点的指南。 Here is an example query这是一个示例查询

@image_features =
    EXTRACT copyright string, 
            equipment_make string,
            equipment_model string,
            description string,
            thumbnail byte[], 
            name string, format string
    FROM @"/Samples/Data/Images/{name}.{format}"

    USING new Images.ImageFeatureExtractor(scaleWidth: 500, scaleHeight: 300);

@image_features = SELECT * FROM @image_features
                  WHERE format IN("JPEG", "jpeg", "jpg", "JPG");

OUTPUT @image_features
TO @"/output/images/image_features.csv"
USING Outputters.Csv();

Or have another process extract those properties and put them in some metadatafile in Azure Data Lake so you can join that file.或者让另一个进程提取这些属性并将它们放入 Azure Data Lake 中的某个元数据文件中,以便您可以加入该文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM