简体   繁体   English

通过 iPhone 应用程序以字符串形式读取 PDF 文件

[英]Reading PDF files as string through iPhone application

I am facing some problem in iPhone application development for "Reading PDF".我在“阅读 PDF”的 iPhone 应用程序开发中遇到了一些问题。 I have tried following code.我试过下面的代码。 I know I have used wrong methods for parsing - parsing methods are just used for searching purpose.我知道我使用了错误的解析方法 - 解析方法仅用于搜索目的。 But I want to convert entire pdf text in to a string.但我想将整个 pdf 文本转换为字符串。 Say for example Apple's MobileHIG.pdf - I have used in this code.比如说 Apple 的 MobileHIG.pdf - 我在这段代码中使用过。

@implementation NetPDFViewController

size_t totalPages;  // a variable to store total pages

// a method to get the pdf ref
CGPDFDocumentRef MyGetPDFDocumentRef (const char *filename) {
    CFStringRef path;
    CFURLRef url;
    CGPDFDocumentRef document;
    path = CFStringCreateWithCString (NULL, filename,kCFStringEncodingUTF8);
    url = CFURLCreateWithFileSystemPath (NULL, path, kCFURLPOSIXPathStyle, 0);
    CFRelease (path);
    document = CGPDFDocumentCreateWithURL (url);// 2
    CFRelease(url);
    int count = CGPDFDocumentGetNumberOfPages (document);// 3
    if (count == 0) {
        printf("`%s' needs at least one page!", filename);
        return NULL;
    }
    return document;
}

// table methods to parse pdf
static void op_MP (CGPDFScannerRef s, void *info) {
    const char *name;
    if (!CGPDFScannerPopName(s, &name))
        return;
    printf("MP /%s\n", name);   
}

static void op_DP (CGPDFScannerRef s, void *info) {
    const char *name;
    if (!CGPDFScannerPopName(s, &name))
        return;
    printf("DP /%s\n", name);   
}

static void op_BMC (CGPDFScannerRef s, void *info) {
    const char *name;
    if (!CGPDFScannerPopName(s, &name))
        return;
    printf("BMC /%s\n", name);  
}

static void op_BDC (CGPDFScannerRef s, void *info) {
    const char *name;
    if (!CGPDFScannerPopName(s, &name))
        return;
    printf("BDC /%s\n", name);  
}

static void op_EMC (CGPDFScannerRef s, void *info) {
    const char *name;
    if (!CGPDFScannerPopName(s, &name))
        return;
    printf("EMC /%s\n", name);  
}

// a method to display pdf page.

void MyDisplayPDFPage (CGContextRef myContext,size_t pageNumber,const char *filename) {
    CGPDFDocumentRef document;
    CGPDFPageRef page;
    document = MyGetPDFDocumentRef (filename);// 1
    totalPages=CGPDFDocumentGetNumberOfPages(document);
    page = CGPDFDocumentGetPage (document, pageNumber);// 2

    CGPDFDictionaryRef d;

    d = CGPDFPageGetDictionary(page);

// ----- edit   problem here - CGPDFDictionary is completely unknown 
// ----- as we don't know keys & values of it.
    CGPDFScannerRef myScanner; 
    CGPDFOperatorTableRef myTable;
    myTable = CGPDFOperatorTableCreate();
    CGPDFOperatorTableSetCallback (myTable, "MP", &op_MP);
    CGPDFOperatorTableSetCallback (myTable, "DP", &op_DP);
    CGPDFOperatorTableSetCallback (myTable, "BMC", &op_BMC);
    CGPDFOperatorTableSetCallback (myTable, "BDC", &op_BDC);
    CGPDFOperatorTableSetCallback (myTable, "EMC", &op_EMC);

    CGPDFContentStreamRef myContentStream = CGPDFContentStreamCreateWithPage (page);// 3
    myScanner = CGPDFScannerCreate (myContentStream, myTable, NULL);// 4

    CGPDFScannerScan (myScanner);// 5

//  CGPDFDictionaryRef d;

    CGPDFStringRef str; // represents a sequence of bytes

    d = CGPDFPageGetDictionary(page);

    if (CGPDFDictionaryGetString(d, "Thumb", &str)){
        CFStringRef s;
        s = CGPDFStringCopyTextString(str);
        if (s != NULL) {
            //need something in here in case it cant find anything
            NSLog(@"%@ testing it", s);
        }
        CFRelease(s);       
//      CFDataRef data = CGPDFStreamCopyData (stream, CGPDFDataFormatRaw);
    }

// -----------------------------------  

    CGContextDrawPDFPage (myContext, page);// 3
    CGContextTranslateCTM(myContext, 0, 20);
    CGContextScaleCTM(myContext, 1.0, -1.0);
    CGPDFDocumentRelease (document);// 4
}

- (void)viewDidLoad {
    [super viewDidLoad];


// -------------------------------------------------------- 
// code for simple direct image from pdf docs.
    UIGraphicsBeginImageContext(CGSizeMake(320, 460));
    initialPage=28;
    MyDisplayPDFPage(UIGraphicsGetCurrentContext(), initialPage, [[[NSBundle mainBundle] pathForResource:@"MobileHIG" ofType:@"pdf"] UTF8String]);
    imgV.image=UIGraphicsGetImageFromCurrentImageContext();
    imgV.image=[imgV.image rotate:UIImageOrientationDownMirrored];  
}

- (void)touchesBegan:(NSSet *)touches withEvent:(UIEvent *)event{
    UITouch *touch = [touches anyObject];
    CGPoint LasttouchPoint =  [touch locationInView:self.view];
    int LasttouchX = LasttouchPoint.x;
    startpoint=LasttouchX;
}


- (void)touchesMoved:(NSSet *)touches withEvent:(UIEvent *)event{

}

- (void)touchesEnded:(NSSet *)touches withEvent:(UIEvent *)event{
    UITouch *touch = [touches anyObject];
    CGPoint LasttouchPoint =  [touch locationInView:self.view];
    int LasttouchX = LasttouchPoint.x;
    endpoint=LasttouchX;
    if(startpoint>(endpoint+75)){
        initialPage++;
        [self loadPage:initialPage nextOne:YES];
    } else if((startpoint+75)<endpoint){
        initialPage--;
        [self loadPage:initialPage nextOne:NO];
    }
}


-(void)loadPage:(NSUInteger)page nextOne:(BOOL)yesOrNo{
    if(page<=totalPages && page>0){
        UIGraphicsBeginImageContext(CGSizeMake(720, 720));  
        MyDisplayPDFPage(UIGraphicsGetCurrentContext(), page, [[[NSBundle mainBundle] pathForResource:@"MobileHIG" ofType:@"pdf"] UTF8String]);

        CATransition *transition = [CATransition animation];
        transition.duration = 0.75;
        transition.timingFunction = [CAMediaTimingFunction functionWithName:kCAMediaTimingFunctionEaseInEaseOut];
        transition.type=kCATransitionPush;
        if(yesOrNo){
            transition.subtype=kCATransitionFromRight;
        } else {
            transition.subtype=kCATransitionFromLeft;
        }

        transition.delegate = self;
        [imgV.layer addAnimation:transition forKey:nil];
        imgV.image=UIGraphicsGetImageFromCurrentImageContext();
        imgV.image=[imgV.image rotate:UIImageOrientationDownMirrored];
    }
}

But I didn't get success to read even a single line from the pdf document.但是我什至无法从 pdf 文档中读取一行。 What is still missing?还缺少什么?

If you want to extract some content from a pdf file, then you may want to read the following:如果您想从 pdf 文件中提取一些内容,那么您可能需要阅读以下内容:

Parsing PDF Content 解析 PDF 内容

from the Quartz 2D programming guide.来自 Quartz 2D 编程指南。

Basically, you will use a CGPDFScanner object to parse the contents, which works as follows.基本上,您将使用CGPDFScanner对象来解析内容,其工作原理如下。 You register a few callbacks that will be automatically invoked by Quartz 2D upon encountering some pdf operators in the pdf stream.您注册一些回调,当遇到 pdf 流中的某些 pdf 运算符时,Quartz 2D 会自动调用这些回调。 After this initial step, you then actually start parsing the pdf stream.在这个初始步骤之后,您实际上开始解析 pdf 流。

Taking a brief look at your code, it appears that you are not following the steps required to parse the pdf content of the page you get through CGPDFDocumentGetPage() .简单看一下您的代码,您似乎没有按照解析通过CGPDFDocumentGetPage()获得的页面的 pdf 内容所需的步骤进行操作。 You need first to setup the callbacks using CGPDFOperatorTableCreate() and CGPDFOperatorTableSetCallback() , then you get the page, you need to create a content stream using that page (using CGPDFContentStreamCreateWithPage() ) and then instantiate a CGPDFScanner through CGPDFScannerCreate() and actually start scanning through CGPDFScannerScan() .您首先需要使用CGPDFOperatorTableCreate()CGPDFOperatorTableSetCallback()设置回调,然后获取页面,您需要使用该页面创建内容流(使用CGPDFContentStreamCreateWithPage() ),然后通过CGPDFScannerCreate()实例化CGPDFScanner并实际启动通过CGPDFScannerScan()扫描。

The "Parsing PDF Content" section of the document pointed out by the above URL gives you all of the information required to implement pdf parsing.上述 URL 指出的文档的“解析 PDF 内容”部分为您提供了实现 pdf 解析所需的所有信息。

Hope this helps.希望这可以帮助。

我有一个图书馆可以做这件事,链接在这里: https : //bitbucket.org/zachron/pdfiphone/overview

Look at how the QuartzDemo sample application does this, specifically the QuartzPDFView class in the QuartzImages.h and QuartzImages.m files.看看QuartzDemo示例应用程序是如何做到这一点的,特别是 QuartzImages.h 和 QuartzImages.m 文件中的 QuartzPDFView 类。 It shows an example of loading a PDF via Quartz.它显示了通过 Quartz 加载 PDF 的示例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM