简体   繁体   中英

Conditional selecting of list elements

I want to extract both survey questions and survey items (which are formatted with header 2 and header 3 respectively, from a word document.

After importing and reading all paragraphs of a word document with the docx module into python, I encounter the following problem when having all paragraphs in a total list:

When there is a subset of list elements (in this case, all list elements which are formatted as "header 3") I want to add all these elements with this formatting into a distinctive list until there is a paragraph is not formatted with "header 3".

If there is another occurence of another subset of list elements with "header 3" I want to add them to a different list.

I have already created a dictionary with the keys being the survey questions and the values being an empty list which needs to be replaced by the individual lists of items.

import docx
import random
import string

doc = docx.Document('test2.docx')

all_paras = doc.paragraphs


questions = []
items = []
questions_and_items = {}
items_group = []

# questions#

for paragraph in all_paras:
    if paragraph.style.name.startswith('Heading 2'): 
        questions.append(paragraph.text)

# answer items#

for paragraph in all_paras:
    if paragraph.style.name.startswith('Heading 3'):
        items.append(paragraph.text)

# prepare keys of list

for question in questions:
    questions_and_items[question] = []

My question is now: What is the best way to extract the relevant, suitable sublist of elements which relate to the certain questions and add them to the suitable key in the dictionary?

Try doing a single loop through the paragraphs, adding q/a combinations as you go.

import docx

def get_q_a(paragraphs, is_question, is_answer):
    question = None
    answers = []
    q_and_a = {}
    for paragraph in paragraphs:
        if is_question(paragraph):
            if question is not None:
                q_and_a[question] = answers
            question = paragraph
            answers = []
        elif is_answer(paragraph):
            answers.append(paragraph)
    if question is not None:
        q_and_a[question] = answers
    return q_and_a

if __name__ == '__main__':
   doc = docx.Document('test2.docx')
   print(get_q_a(doc.paragraphs,
                 lambda p: p.style.name.startswith('Heading 2'),
                 lambda p: p.style.name.startswith('Heading 3')))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM