简体   繁体   English

从Word和Excel文件中提取元数据?

[英]Extract metadata from Word and Excel files?

I have a series of nested folders in a Windows 7 operating system, all holding files. 我在Windows 7操作系统中有一系列嵌套文件夹,所有文件都保存在其中。 The files are Word and Excel. 这些文件是Word和Excel。 I would like to extract the Authors , Owner , Date Modified , and Date Created metadata field from each file, and output to a text file. 我想从每个文件中提取AuthorsOwnerDate ModifiedDate Created元数据字段,并将其输出到文本文件。

My first attempt to do this involved using PowerShell. 我第一次尝试使用PowerShell。 The code presented by the OP here works great, but does not recurse through the folders. OP在此处提供的代码效果很好,但不会在文件夹中递归。 I experimented with various ways of combining 'Get-ChildItem' with the funMetadata function, but was unable to do so. 我尝试了将“ Get-ChildItem”与funMetadata函数组合的各种方法,但无法做到这一点。 Since it didn't work for the OP, and there were no solutions offered by the SO community, it seemed unwise to try to continue to try and fix this code. 由于它不适用于OP,并且SO社区没有提供解决方案,因此尝试继续尝试并修复此代码似乎是不明智的。 Instead, I focused on modifying the OP's solution (below) by substituting my own metadata fields, but the output text file simply says "Authors" followed by a blank page. 相反,我专注于通过替换自己的元数据字段来修改OP的解决方案(如下),但是输出文本文件仅显示“作者”,后跟空白页。 Here's what I tried: 这是我尝试过的:

(navigate to root folder): Get-ChildItem -Recurse | Select-Object Authors | Out-file "C:\\text5.txt" (导航到根文件夹): Get-ChildItem -Recurse | Select-Object Authors | Out-file "C:\\text5.txt" Get-ChildItem -Recurse | Select-Object Authors | Out-file "C:\\text5.txt"

(By the way, the metadata definitely exists within the files - I know this by using the OP's original, not-able-to-recurse script). (顺便说一下,元数据肯定存在于文件中-我通过使用OP的原始不可递归脚本知道了这一点)。

Trying a different tack, I downloaded two python modules - hachoir and oletools - but once I got them installed, I did not know where to start. 尝试使用其他方法,我下载了两个python模块-hachoiroletools-但是一旦安装它们,我不知道从哪里开始。 It doesn't seem as if there is any documentation. 似乎没有任何文档。

Does anyone have advice for me? 有人对我有建议吗?

EDIT: I just found some new info here , and this is probably a duplicate question. 编辑:我刚刚在这里找到了一些新信息,这可能是一个重复的问题。 I hate to delete it now though, in case someone's working on an answer. 不过,如果有人正在研究答案,我现在不愿删除它。 Apologies for any confusion 如有任何歉意,敬请见谅

You still can use PowerShell, you just needed to tie everything together to loop through all of your files. 您仍然可以使用PowerShell,只需要将所有内容捆绑在一起即可遍历所有文件。

$RootFolder = "C:\example"
$files = Get-ChildItem $RootFolder -Recurse
foreach ($file in $files) { 
    $Folder = Split-Path $File.FullName
    $FileName = Split-Path $File.FullName -Leaf
    $Shell = New-Object -ComObject Shell.Application
    $FolderObject = $Shell.namespace($Folder)
    $FileObject = $FolderObject.ParseName($FileName)
    $FolderObject.GetDetailsOf($FileObject,-1)
}

Note: The object returned by Get-ChildItem doesn't have the author metadata, so when you use Select-Object it creates a blank property. 注意: Get-ChildItem返回的对象没有作者元数据,因此,当您使用Select-Object它将创建一个空白属性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM