I have a PDF file and I'm trying to get the specific data in it using regex but I'm getting the following error.
this is the content of my pdf file: Question.pdf
import re
import tabula
df = tabula.read_pdf('Question.pdf', pages=1,lattice=True)[1]
df.columns = df.columns.str.replace('\r', ' ')
data = df.dropna()
data.str.extract(r'.*\?\s*(.*)')
data.to_excel('data.xlsx', index=False, header=None)
Traceback (most recent call last):
File "C:\Users\User1\Desktop\test.py", line 7, in <module>
data.str.extract(r'.*\?\s*(.*)')
File "C:\Users\User1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\generic.py", line 5575, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'str'. Did you mean: 'std'?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.