简体   繁体   中英

How to extract a few lines from a pdf files?

I have a PDF file that goes like this:

Q1. How many planets are there? A. 1 B. 3 C. 8 D. 9

Answer: 8

Explanation: bla bla bla

Q2. How many moons are there? A. 1 B. 3 C. 8 D. 9

Answer: 1

Explanation: bla bla bla

Q3. Who is Alex's friend? A. Adam B. Donald C. Joe D. Jack

Answer: Joe

Explanation: bla bla bla

And so on upto Q100

How to remove the Answer and Explanation and only get the Questions, ie in the following format

Q1. How many planets are there? A. 1 B. 3 C. 8 D. 9

Q2. How many moons are there? A. 1 B. 3 C. 8 D. 9

Q3. Who is Alex's friend? A. Adam B. Donald C. Joe D. Jack

.... and so on upto Q100

OK, After a lot of trial and error was abe to figure out First I converted the PDF to txt file then used the code (pic attached)

Basically converted the file to lists and used regex to append only needed outputs. (input File was UTF). A very dumb way of doing it, but it works!!!

将文件转换为列表并使用正则表达式仅附加需要的输出

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM