简体   繁体   English

TypeError:预期的字符缓冲区对象

[英]TypeError: expected a character buffer object

I have been trying to print the output to a new text file. 我一直在尝试将输出打印到新的文本文件。 But I get the error 但是我得到了错误

TypeError: expected a character buffer object

What I'm trying to do is convert pdf to text and copy the text obtained to a new file. 我想做的是将pdf转换为文本,并将获得的文本复制到新文件中。

import pyPdf

def getPDFContent():
  content = ""
  # Load PDF into pyPDF
  pdf = pyPdf.PdfFileReader(file("D:\output.pdf", "rb"))
  # Iterate pages
  for i in range(0, pdf.getNumPages()):
    # Extract text from page and add to content
    #content += pdf.getPage(i).extractText() + "\n"
    print pdf.getPage(i).extractText().encode("ascii", "ignore")

  # Collapse whitespace
  #content = " ".join(content.replace(u"\xa0", " ").strip().split())
  #return content

  #getPDFContent().encode("ascii", "ignore")
  getPDFContent()

  s =getPDFContent()
  with open('D:\pdftxt.txt', 'w') as pdftxt:
      pdftxt.write(s)

I did try to initialize s as str but then I get the error as "can't assign to function call". 我确实尝试将s初始化为str但随后出现错误,因为“无法分配给函数调用”。

You are not returning anything getPDFContent() so basically you are writing None . 您不会返回任何getPDFContent()因此基本上您正在编写None

 result=[]
 for i in range(0, pdf.getNumPages()):
     result.append(pdf.getPage(i).extractText().encode("ascii", "ignore")) # store all in a list
 return result


 s = getPDFContent()
 with open('D:\pdftxt.txt', 'w') as pdftxt:
    pdftxt.writelines(s) # use writelines to write list content

How your code should look: 您的代码应如何显示:

def getPDFContent():
    # Load PDF into pyPDF
    pdf = pyPdf.PdfFileReader(file("D:\output.pdf", "rb"))
    # Iterate pages
    result = []
    for i in range(0, pdf.getNumPages()):
        result.append(pdf.getPage(i).extractText().encode("ascii", "ignore"))
    return result

s = getPDFContent()
with open('D:\pdftxt.txt', 'w') as pdftxt:
    pdftxt.writelines(s)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM