[英]Extrating Text from PowerPoint Slide
我有一個用Office 2010(和2007)構建的PowerPoint幻燈片,我需要以編程方式從中提取文本。 我猜想Office會在某個地方創建一個xml文件,其中可能包含我需要的所有文本。
有辦法做到這一點,我將如何解決呢?
我有VS2010,SharePoint Designer 2007,可用於工具。
謝謝,
里升
是的,使用Linq-to-XML很難做到這一點,這是一種甚至越來越簡單的方法。 請注意,我沒有使用Open XML SDK-我只是將VB.NET與XML Literals和System.IO.Packaging一起使用。 當然,您可以使用SDK,C#等以更復雜的方式執行此操作-取決於您的環境/首選項。
這是您做#2(簡單方法)的方法:
Imports System.IO
Imports System.IO.Packaging 'Add reference to WindowsBase for this
Imports <xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main">
Imports <xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
Imports <xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
Module Module1
Public Const documentRelationshipType As String = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument"
Sub Main()
Dim slide, document As XElement
Dim pptPackage As Package = Nothing
Dim slidePart, documentPart As PackagePart
Dim filePath As String = "C:\Users\Todd\Desktop\yourpresentation.pptx"
pptPackage = Package.Open(filePath, FileMode.Open, FileAccess.ReadWrite)
Using pptPackage
Dim documentRelationship As PackageRelationship = pptPackage.GetRelationshipsByType(documentRelationshipType).FirstOrDefault
Dim documentUri As Uri = PackUriHelper.ResolvePartUri(New Uri("/", UriKind.Relative), documentRelationship.TargetUri)
documentPart = pptPackage.GetPart(documentUri)
document = XElement.Load(New StreamReader(documentPart.GetStream))
Dim slideList = From e In document.<p:sldIdLst>.<p:sldId>
For i = 0 To slideList.Count - 1
Dim slideReference As String = slideList(i).@r:id.ToString
slidePart = pptPackage.GetPart(PackUriHelper.ResolvePartUri(documentPart.Uri, documentPart.GetRelationship(slideReference).TargetUri))
slide = XElement.Load(New StreamReader(slidePart.GetStream))
Dim rawText = From e In slide...<a:t>
For Each t In rawText
Console.WriteLine(t.Value)
Next
Next
End Using
Console.ReadLine()
End Sub
End Module
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.