How to extract Bold text from pdf using python?

Question

The list below provides examples of items and services that should not be billed separately. Please note that the list is not all inclusive.

1. Surgical rooms and services – To include surgical suites, major and minor, treatment rooms, endoscopy labs, cardiac cath labs, X-ray.

2. Facility Basic Charges - pulmonary and cardiology procedural rooms. The hospital's charge for surgical suites and services shall include the entire above listed nursing personnel services, supplies, and equipment

I want output like:

Surgical rooms and services
Facility Basic Charges

there is first sentence also bold but we need to omit that sentence, we need to extract only those text which are represented with numbers

Answer 1

You can do it using this code:

import pdfplumber
with pdfplumber.open('test.pdf') as pdf: 
    text = pdf.pages[0]
    clean_text = text.filter(lambda obj: obj["object_type"] == "char" and "Bold" in obj["fontname"])
    print(clean_text.extract_text())

It use pdfplumber library, so for more info you can check they documentation

Answer 2

Use This Code:

import pdfplumber
import re
demo = []
with pdfplumber.open('HCSC IL Inpatient_Outpatient Unbundling Policy- Facility.pdf') as pdf: 
    for i in range(0, 50):
        try:
            text = pdf.pages[i]  
            clean_text = text.filter(lambda obj: obj["object_type"] == "char" and "Bold" in obj["fontname"])
            demo.append(str(re.findall(r'(\d+\.\s.*\n?)+', clean_text.extract_text())).replace('[]', ' '))
        except IndexError:
            print("")
            break

How to extract Bold text from pdf using python?

Question

2 answers

solution1
0 2022-01-31 20:59:34

solution2
0 ACCPTED 2022-02-01 17:36:08

How to extract Bold text from pdf using python?

Question

2 answers

solution1 0 2022-01-31 20:59:34

solution2 0 ACCPTED 2022-02-01 17:36:08

solution1
0 2022-01-31 20:59:34

solution2
0 ACCPTED 2022-02-01 17:36:08