简体   繁体   中英

Docx4j - how to get a docx checkbox status

I am trying to read a stack of identically formatted word docx files, and extract the data to a database. I dont have any issues with the text, but I am struggling with the checkboxes. I need to say that I am new to docx4j, but have been struggling with this one for four days now. I would really value some assistance/help/advice.

I have attached a document ( test.docx ), that I am trying to read. The first checkbox, which I have inserted myself using Word, is detected by my code and appears on the initial pass as a CTSdtCell, but the other checkboxes are not. They seem to be represented in the file differently, by a CTObject, CTSHape, CTIMageData and a CTControl, and I cannot find a way of getting the checkbox from these or one of these.

public static void main(String[] args) throws Exception {
    WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File("test.docx"));      
    MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
    Finder finder = new Finder(FldChar.class);
    new TraversalUtil(documentPart.getContent(), finder);
}

public static class Finder extends CallbackImpl {
    protected Class<?> typeToFind;
    protected Finder(Class<?> typeToFind) {
        this.typeToFind = typeToFind;
    }

    public List<Object> results = new ArrayList<Object>(); 

    @Override
    public List<Object> apply(Object o) {
        String txtVal="";
        System.out.println(o.getClass().getName());

        if (o instanceof org.docx4j.wml.CTSdtCell) {
            List<Object> objs = ((org.docx4j.wml.CTSdtCell)o).getSdtPr().getRPrOrAliasOrLock();
            findCheckbox(objs);
        }

        if (o instanceof org.docx4j.wml.SdtRun) {
            List<Object> objs = ((org.docx4j.wml.SdtRun)o).getSdtPr().getRPrOrAliasOrLock();
            findCheckbox(objs);
        }

        if (o instanceof org.docx4j.wml.SdtBlock) {
            List<Object> objs = ((org.docx4j.wml.SdtBlock)o).getSdtPr().getRPrOrAliasOrLock();
            findCheckbox(objs);
        }

        if (o instanceof org.docx4j.wml.Text) {
            System.out.println("      Text Value : "+((org.docx4j.wml.Text)o).getValue());
        }

        // Adapt as required
        if (o.getClass().equals(typeToFind)) {
            results.add(o);
        }
        return null;
    }

    private static void findCheckbox(List<Object> objs) {
        for (Object obj : objs) {
            if (obj instanceof javax.xml.bind.JAXBElement) {
                if (((javax.xml.bind.JAXBElement)obj).getDeclaredType().getName().equals("org.docx4j.w14.CTSdtCheckbox")) {
                    JAXBElement<CTSdtCheckbox> elem = ((javax.xml.bind.JAXBElement)obj);
                    org.docx4j.w14.CTSdtCheckbox cb = elem.getValue();
                    org.docx4j.w14.CTOnOff OnOff=cb.getChecked();
                    System.out.println("      CheckBox found with value="+OnOff.getVal());
                }
            }
        }
    }
}

The results are:

org.docx4j.wml.Tbl
org.docx4j.wml.Tr
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : WORK INSTRUCTION #
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Drawing
org.docx4j.dml.wordprocessingDrawing.Inline
org.docx4j.dml.CTBlip
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value :  
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : A
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value :  
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value :  
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : STEP BY STEP
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value :  
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : - 
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : WORK INSTRUCTION
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Drawing
org.docx4j.dml.wordprocessingDrawing.Inline
org.docx4j.dml.CTBlip
org.docx4j.wml.Tr
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : 1234567
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.Tr
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : TASK
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : Chlorine drum change
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : DATE
org.docx4j.wml.CTSdtCell
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : 12/07/2015
org.docx4j.wml.Tr
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : MACHINE
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : ORIGINATOR
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : D.GROVE
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : CLOCK NUMBER
org.docx4j.wml.CTSdtCell
      CheckBox found with value=1
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : ?
org.docx4j.wml.Tr
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : AREA
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : CHLORINE HOUSE
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : CHECKED
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value :  
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : (EXPERT)
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : J Clarke
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : CLOCK NUMBER
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : 4985
org.docx4j.wml.Tr
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : PPE 
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Drawing
org.docx4j.dml.wordprocessingDrawing.Anchor
org.docx4j.dml.CTBlip
org.docx4j.dml.CTColorChangeEffect
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : EYE
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Drawing
org.docx4j.dml.wordprocessingDrawing.Anchor
org.docx4j.dml.CTBlip
org.docx4j.dml.CTColorChangeEffect
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : EAR
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Drawing
org.docx4j.dml.wordprocessingDrawing.Anchor
org.docx4j.dml.CTBlip
org.docx4j.dml.CTColorChangeEffect
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : FOOT
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Drawing
org.docx4j.dml.wordprocessingDrawing.Anchor
org.docx4j.dml.CTBlip
org.docx4j.dml.CTColorChangeEffect
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : HEAD
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Drawing
org.docx4j.dml.wordprocessingDrawing.Anchor
org.docx4j.dml.CTBlip
org.docx4j.dml.CTColorChangeEffect
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : HAND
org.docx4j.wml.Tr
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.CTObject
org.docx4j.vml.CTShapetype
org.docx4j.vml.CTStroke
org.docx4j.vml.CTFormulas
org.docx4j.vml.CTF
org.docx4j.vml.CTF
org.docx4j.vml.CTF
org.docx4j.vml.CTF
org.docx4j.vml.CTF
org.docx4j.vml.CTF
org.docx4j.vml.CTF
org.docx4j.vml.CTF
org.docx4j.vml.CTF
org.docx4j.vml.CTF
org.docx4j.vml.CTF
org.docx4j.vml.CTF
org.docx4j.vml.CTPath
org.docx4j.vml.officedrawing.CTLock
org.docx4j.vml.CTShape
org.docx4j.vml.CTImageData
org.docx4j.wml.CTControl
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.CTObject
org.docx4j.vml.CTShape
org.docx4j.vml.CTImageData
org.docx4j.wml.CTControl
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.CTObject
org.docx4j.vml.CTShape
org.docx4j.vml.CTImageData
org.docx4j.wml.CTControl
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.CTObject
org.docx4j.vml.CTShape
org.docx4j.vml.CTImageData
org.docx4j.wml.CTControl
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.CTObject
org.docx4j.vml.CTShape
org.docx4j.vml.CTImageData
org.docx4j.wml.CTControl
org.docx4j.wml.Tr
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : COSHH
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : SPECIAL PPE REQUIREMENTS
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : *SITE 
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : R/A NUMBER
org.docx4j.wml.Tr
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : CONSIDERATION
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : PRODUCTS
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : B.A. EQUIPMENT
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : 12668
org.docx4j.wml.Tr
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.CTObject
org.docx4j.vml.CTShape
org.docx4j.vml.CTImageData
org.docx4j.wml.CTControl
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value : CHLORINE
org.docx4j.wml.R
org.docx4j.wml.Text
      Text Value :  GAS
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.Tr
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.Tc
org.docx4j.wml.P
org.docx4j.wml.P
org.docx4j.wml.CTBookmark
org.docx4j.wml.CTMarkupRange

I have now added the output from a MainDocumentPart.getXML() for the cell containing one of the elusive checkboxes. I can see nothing there to tell me the value. Can anyone tell me what I am missing please?

<w:tc>
        <w:tcPr>
            <w:tcW w:w="1015" w:type="dxa"/>
            <w:tcBorders>
                <w:left w:val="single" w:color="auto" w:sz="24" w:space="0"/>
                <w:bottom w:val="single" w:color="auto" w:sz="24" w:space="0"/>
                <w:right w:val="single" w:color="auto" w:sz="24" w:space="0"/>
            </w:tcBorders>
            <w:vAlign w:val="center"/>
        </w:tcPr>
        <w:p w:rsidRPr="00A7008C" w:rsidR="00F909A4" w:rsidP="00017AE9" w:rsidRDefault="000F5760">
            <w:pPr>
                <w:jc w:val="center"/>
                <w:rPr>
                    <w:b/>
                    <w:color w:val="FFFFFF" w:themeColor="background1"/>
                </w:rPr>
            </w:pPr>
            <w:r>
                <w:rPr>
                    <w:b/>
                    <w:color w:val="FFFFFF" w:themeColor="background1"/>
                    <w:sz w:val="36"/>
                </w:rPr>
                <w:object w:dxaOrig="225" w:dyaOrig="225">
                    <v:shape type="#_x0000_t75" style="width:12pt;height:29.25pt" id="_x0000_i1063" o:ole="">
                        <v:imagedata o:title="" r:id="rId17"/>
                    </v:shape>
                    <w:control w:name="CheckBox11" w:shapeid="_x0000_i1063" r:id="rId18"/>
                </w:object>
            </w:r>
            <w:bookmarkEnd w:id="0"/>
        </w:p>
    </w:tc>

I have cracked it!! The CTImageData's point to images which can be accessed via the document's relationships. These images contain the ticked or unticked boxes. By checking the size of the images I can tell which it is.

I do not understand Word more than for superficial use, and do not know how these 'checkboxes' were created, but it seems they were not created the same way as my test ones. I therefore do not know whether these images may change if/when the organisation upgrades its MS Office software, edits and saves the docs files again. However the need for my software will change quickly after initial load and so the implication of this risk is small for me.

The existing checkboxes are legacy ActiveX controls:

          <w:object w:dxaOrig="225" w:dyaOrig="225">
            <v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f">
              <v:stroke joinstyle="miter"/>
              <v:formulas>
                :
              </v:formulas>
              <v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>
              <o:lock v:ext="edit" aspectratio="t"/>
            </v:shapetype>
            <v:shape id="_x0000_i1025" type="#_x0000_t75" style="width:12pt;height:29.25pt" o:ole="">
              <v:imagedata r:id="rId15" o:title=""/>
            </v:shape>
            <w:control r:id="rId16" w:name="CheckBox" w:shapeid="_x0000_i1025"/>
          </w:object>

The ones you are creating are modern XML-friendly checkbox content controls.

There are also checkbox characters, and checkbox form fields...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM