I have the following textfile and need to extract the text from the Element and Cluster matches. I'm using the following regular expression:
CLUSTER\[at....\][\d\D]+?{[\d\D]+?items[\d\D]+?}|(ELEMENT\[at....\][\d\D]+?}[\d\D]+?})
While this works fine, for some particular textfiles like this one below, where some elements will have more than one DV
match, it won't extract the entire value matches section only the first element.
For example ELEMENT[at0030]
will leave out the DV_TEXT
matches and DV_PROPORTION
matches while ELEMENT[at0028]
will match everything I need.
I need my regular expression to be able to get everything inside the "value matches" curly braces of each ELEMENT
not just the first value from here. Any help?
Down here is an example of a textfile I'm working on:
definition
CLUSTER[at0000] matches { -- Examination of a cleavage-stage embryo
items cardinality matches {1..*; unordered} matches {
ELEMENT[at0028] occurrences matches {0..1} matches { -- Number of cells
value matches {
DV_COUNT matches {*}
}
}
ELEMENT[at0030] occurrences matches {0..1} matches { -- Fragmentation
value matches {
DV_CODED_TEXT matches {
defining_code matches {
[local::
at0031, -- None
at0032, -- Mild fragmentation
at0033, -- Moderate fragmentation
at0034] -- Severe fragmentation
}
}
DV_TEXT matches {*}
DV_PROPORTION matches {*}
}
}
ELEMENT[at0035] occurrences matches {0..1} matches { -- Blastomere size
value matches {
DV_CODED_TEXT matches {
defining_code matches {
[local::
at0036, -- Equal, stage specific
at0037, -- Unequal, stage specific
at0053, -- Equal, non-stage specific
at0054] -- Unequal, non-stage specific
}
}
DV_TEXT matches {*}
}
}
ELEMENT[at0038] occurrences matches {0..1} matches { -- Nucleation
value matches {
DV_CODED_TEXT matches {
defining_code matches {
[local::
at0039, -- No visible nuclei
at0040, -- Mononucleation
at0041, -- Binucleation
at0051, -- Multinucleation
at0052] -- Broad multinucleation
}
}
DV_TEXT matches {*}
}
}
ELEMENT[at0042] occurrences matches {0..1} matches { -- Cytoplasmic morphology
value matches {
DV_TEXT matches {*}
}
}
ELEMENT[at0043] occurrences matches {0..1} matches { -- Spatial distribution of cells
value matches {
DV_TEXT matches {*}
}
}
ELEMENT[at0044] occurrences matches {0..1} matches { -- Compaction
value matches {
DV_CODED_TEXT matches {
defining_code matches {
[local::
at0045, -- None
at0046, -- Minimal
at0047, -- Moderate
at0048] -- Complete
}
}
DV_TEXT matches {*}
}
}
ELEMENT[at0049] occurrences matches {0..*} matches { -- Other morphological features
value matches {
DV_TEXT matches {*}
}
}
ELEMENT[at0055] occurrences matches {0..1} matches { -- Morphology grade
value matches {
DV_TEXT matches {*}
}
}
}
}
ontology
term_definitions = <
["en"] = <
items = <
["at0000"] = <
text = <"Examination of a cleavage-stage embryo">
description = <"Morphological findings obtained by microscopy of the human cleavage-stage embryo.">
>
["at0028"] = <
text = <"Number of cells">
description = <"Number of cells in a cleavage-stage embryo.">
>
["at0030"] = <
text = <"Fragmentation">
description = <"Cytoplasmic fragmentation in a cleavage-stage embryo.">
comment = <"The proportion data type can be used to record a more precise assessment.">
>
["at0031"] = <
text = <"None">
description = <"Absence of cytoplasmic fragments.">
>
["at0032"] = <
text = <"Mild fragmentation">
description = <"Cytoplasmic fragments cover < 10% of the total cytoplasmic volume.">
>
["at0033"] = <
text = <"Moderate fragmentation">
description = <"Cytoplasmic fragments cover 10 - 25% of the total cytoplasmic volume.">
>
For example:
const rx = /(?<=ELEMENT\[at\d{4}\] occurrences[^\n]+\n( +)value matches \{)[\d\D]+?\n\1(?=\})/g;
console.log(text.match(rx));
The basic idea is to solve the problem of having to count opening and closing curly braces by instead capturing the number of spaces after a newline and before "value matches"
using \n( +)value matches
and then matching everything until there is a newline followed by the same number of spaces and a curly brace, \n\1\}
.
Is that clear?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.