简体   繁体   English

有什么方法可以读取 python 中的文本文件或 pdf 文件?

[英]Is there is any way to read text file or pdf file in python?

I have a module where am reading the text file or pdf file but the things is that i have i got one function called pandas.read_fwf() but this one is like giving me ambiguous result like i have half statement in one line and half one is in second line so this treat these statements as two and storing the values in rows like structure but i want the whole statement to full stop so is there any extra parameter we have to pass in this function or if there is another way to do such thing then please help to resolve this issue. I have a module where am reading the text file or pdf file but the things is that i have i got one function called pandas.read_fwf() but this one is like giving me ambiguous result like i have half statement in one line and half one在第二行,所以这会将这些语句视为两个并将值存储在结构中的行中,但我希望整个语句完全停止,所以我们必须在这个 function 中传递任何额外的参数,或者如果有其他方法可以这样做那么请帮助解决这个问题。

see this:看到这个:

unstructured_text = pd.read_fwf("C:\\Users\\Hp\\Desktop\\NLPtoKG\\Abhay2.txt")
unstructured_text['sentence'] = unstructured_text['sentence'].apply(lambda x: str(x).lower())
unstructured_text

Output of this is:- enter image description here Output 这是:-在此处输入图像描述

but i want the text like from full stop to full stop.但我想要从句号到句号的文字。

if anyone knows how to do this then please let me know.如果有人知道该怎么做,请告诉我。

thanks in advance!提前致谢!

fromhttps://www.geeksforgeeks.org/working-with-pdf-files-in-python/ an improved example:来自https://www.geeksforgeeks.org/working-with-pdf-files-in-python/的改进示例:

# importing required modules
import PyPDF2

with open('example.pdf', 'rb') as pdfFileObj:
    # creating a pdf reader object
    pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
    # printing number of pages in pdf file
    print(pdfReader.numPages)

    # creating a page object
    pageObj = pdfReader.getPage(0)

    # extracting text from page
    print(pageObj.extractText())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM