简体   繁体   中英

Regex to match content between curly braces

I have the following textfile and need to extract the text from the Element and Cluster matches. I'm using the following regular expression:

CLUSTER\[at....\][\d\D]+?{[\d\D]+?items[\d\D]+?}|(ELEMENT\[at....\][\d\D]+?}[\d\D]+?})

While this works fine, for some particular textfiles like this one below, where some elements will have more than one DV match, it won't extract the entire value matches section only the first element.

For example ELEMENT[at0030] will leave out the DV_TEXT matches and DV_PROPORTION matches while ELEMENT[at0028] will match everything I need.

I need my regular expression to be able to get everything inside the "value matches" curly braces of each ELEMENT not just the first value from here. Any help?

Down here is an example of a textfile I'm working on:

definition
    CLUSTER[at0000] matches {    -- Examination of a cleavage-stage embryo
        items cardinality matches {1..*; unordered} matches {
            ELEMENT[at0028] occurrences matches {0..1} matches {    -- Number of cells
                value matches {
                    DV_COUNT matches {*}
                }
            }
            ELEMENT[at0030] occurrences matches {0..1} matches {    -- Fragmentation
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0031,    -- None
                            at0032,    -- Mild fragmentation
                            at0033,    -- Moderate fragmentation
                            at0034]    -- Severe fragmentation
                        }
                    }
                    DV_TEXT matches {*}
                    DV_PROPORTION matches {*}
                }
            }
            ELEMENT[at0035] occurrences matches {0..1} matches {    -- Blastomere size
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0036,    -- Equal, stage specific
                            at0037,    -- Unequal, stage specific
                            at0053,    -- Equal, non-stage specific
                            at0054]    -- Unequal, non-stage specific
                        }
                    }
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0038] occurrences matches {0..1} matches {    -- Nucleation
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0039,    -- No visible nuclei
                            at0040,    -- Mononucleation
                            at0041,    -- Binucleation
                            at0051,    -- Multinucleation
                            at0052]    -- Broad multinucleation
                        }
                    }
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0042] occurrences matches {0..1} matches {    -- Cytoplasmic morphology
                value matches {
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0043] occurrences matches {0..1} matches {    -- Spatial distribution of cells
                value matches {
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0044] occurrences matches {0..1} matches {    -- Compaction
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0045,    -- None
                            at0046,    -- Minimal
                            at0047,    -- Moderate
                            at0048]    -- Complete
                        }
                    }
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0049] occurrences matches {0..*} matches {    -- Other morphological features
                value matches {
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0055] occurrences matches {0..1} matches {    -- Morphology grade
                value matches {
                    DV_TEXT matches {*}
                }
            }
        }
    }


ontology
    term_definitions = <
        ["en"] = <
            items = <
                ["at0000"] = <
                    text = <"Examination of a cleavage-stage embryo">
                    description = <"Morphological findings obtained by microscopy of the human cleavage-stage embryo.">
                >
                ["at0028"] = <
                    text = <"Number of cells">
                    description = <"Number of cells in a cleavage-stage embryo.">
                >
                ["at0030"] = <
                    text = <"Fragmentation">
                    description = <"Cytoplasmic fragmentation in a cleavage-stage embryo.">
                    comment = <"The proportion data type can be used to record a more precise assessment.">
                >
                ["at0031"] = <
                    text = <"None">
                    description = <"Absence of cytoplasmic fragments.">
                >
                ["at0032"] = <
                    text = <"Mild fragmentation">
                    description = <"Cytoplasmic fragments cover < 10% of the total cytoplasmic volume.">
                >
                ["at0033"] = <
                    text = <"Moderate fragmentation">
                    description = <"Cytoplasmic fragments cover 10 - 25% of the total cytoplasmic volume.">
                >

For example:

const rx = /(?<=ELEMENT\[at\d{4}\] occurrences[^\n]+\n( +)value matches \{)[\d\D]+?\n\1(?=\})/g;

console.log(text.match(rx));

The basic idea is to solve the problem of having to count opening and closing curly braces by instead capturing the number of spaces after a newline and before "value matches" using \n( +)value matches and then matching everything until there is a newline followed by the same number of spaces and a curly brace, \n\1\} .

Is that clear?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM