I want to replace the highlighted sections in the first figure 1 (starting with :g
anything in between including newline and multiple tabs, and ending with :
) with the highlighted sections in the second figure 2 ( :o
).
After: I tried
replaceAll(":g.*?:", ":o")
but didn't work due to newline and tabs, then I tried something like replaceAll(":g.*?]\\n\\t\\t\\t:", ":o")
but the number of tabs can vary so I need something to replace all tabs in this highlighted section.
try { File fXmlFile = new File("/Users/eddy/1.xml"); DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder dBuilder = dbFactory.newDocumentBuilder(); Document doc = dBuilder.parse(fXmlFile); doc.getDocumentElement().normalize(); System.out.println(doc.getElementsByTagName("feature_tree").item(0).getTextContent()); String fm = doc.getElementsByTagName("feature_tree").item(0).getTextContent(); fm = fm.replaceAll(":g[^:]*:", ":o"); System.out.println(fm); } catch (Exception e) { e.printStackTrace(); } :r DataMining(_r) :m InputData(_r_1) :m Attribute types(_r_1_35) :o Mixed attribute types(_r_1_141_151_157_158) :g _r_1_35_36(_r_1_35_36) [1,*] : Discrete(_r_1_35_36_39) :g _r_1_35_36_39_41(_r_1_35_36_39_41) [1,*] : Nominal(_r_1_35_36_39_41_43) : Ordinal(_r_1_35_36_39_41_44) : Numerical(_r_1_35_36_40) :g _r_1_35_36_40_45(_r_1_35_36_40_45) [1,*] : Bounded(_r_1_35_36_40_45_46) : Unbounded(_r_1_35_36_40_45_47) : Text(_r_1_35_36_55) : Time(_r_1_35_36_58) :m Data properties(_r_1_141) :o Labeled data(_r_1_141_144) :o More than two classes(_r_1_141_144_154) :o Missing values present(_r_1_141_145) :o Independant attributes(_r_1_141_150) :o Standardized values(_r_1_141_155) :m Data Sets(_r_1_151) :g _r_11_35_36_40_45(_r_11_35_36_40_45) [1,*] : Training set(_r_1_151_152) :m Number of instances(_r_1_151_152_168) :g _r_1_151_152_168_170(_r_1_151_152_168_170) [1,1] : 1-50(_r_1_151_152_168_170_171) : 51-250(_r_11_151_152_168_170_172) : 251-1000(_r_12_151_152_168_170_172) : 1001-10000(_r_1_151_152_168_170_173) : 10001-100000(_r_1_151_152_168_170_174) : 100001-(_r_1_151_152_168_170_175) : Test set(_r_1_151_153)
I don't think a regex is the proper solution, but if your requirements never change replaceAll(":g[^:]*:",":o")
should do the trick.
Test :
public static void main(String t[]) throws IOException {
Path exampleFile = Paths.get(PATH_TO_YOUR_EXAMPLE_AS_TEXT);
String dataAsString = new String(Files.readAllBytes(exampleFile));
System.out.println(dataAsString);
System.out.println(dataAsString.replaceAll(":g[^:]*:", ":o"));
}
Prints :
:r DataMining(_r)
:m InputData(_r_1)
:m Attribute types(_r_1_35)
:o Mixed attribute types(_r_1_141_151_157_158)
:g _r_1_35_36(_r_1_35_36) [1,*]
: Discrete(_r_1_35_36_39)
:g _r_1_35_36_39_41(_r_1_35_36_39_41) [1,*]
: Nominal(_r_1_35_36_39_41_43)
: Ordinal(_r_1_35_36_39_41_44)
: Numerical(_r_1_35_36_40)
:g _r_1_35_36_40_45(_r_1_35_36_40_45) [1,*]
: Bounded(_r_1_35_36_40_45_46)
: Unbounded(_r_1_35_36_40_45_47)
: Text(_r_1_35_36_55)
: Time(_r_1_35_36_58)
:m Data properties(_r_1_141)
:o Labeled data(_r_1_141_144)
:o More than two classes(_r_1_141_144_154)
:o Missing values present(_r_1_141_145)
:o Independant attributes(_r_1_141_150)
:o Standardized values(_r_1_141_155)
:m Data Sets(_r_1_151)
:g _r_11_35_36_40_45(_r_11_35_36_40_45) [1,*]
: Training set(_r_1_151_152)
:m Number of instances(_r_1_151_152_168)
:g _r_1_151_152_168_170(_r_1_151_152_168_170) [1,1]
: 1-50(_r_1_151_152_168_170_171)
: 51-250(_r_11_151_152_168_170_172)
: 251-1000(_r_12_151_152_168_170_172)
: 1001-10000(_r_1_151_152_168_170_173)
: 10001-100000(_r_1_151_152_168_170_174)
: 100001-(_r_1_151_152_168_170_175)
: Test set(_r_1_151_153)
:r DataMining(_r)
:m InputData(_r_1)
:m Attribute types(_r_1_35)
:o Mixed attribute types(_r_1_141_151_157_158)
:o Discrete(_r_1_35_36_39)
:o Nominal(_r_1_35_36_39_41_43)
: Ordinal(_r_1_35_36_39_41_44)
: Numerical(_r_1_35_36_40)
:o Bounded(_r_1_35_36_40_45_46)
: Unbounded(_r_1_35_36_40_45_47)
: Text(_r_1_35_36_55)
: Time(_r_1_35_36_58)
:m Data properties(_r_1_141)
:o Labeled data(_r_1_141_144)
:o More than two classes(_r_1_141_144_154)
:o Missing values present(_r_1_141_145)
:o Independant attributes(_r_1_141_150)
:o Standardized values(_r_1_141_155)
:m Data Sets(_r_1_151)
:o Training set(_r_1_151_152)
:m Number of instances(_r_1_151_152_168)
:o 1-50(_r_1_151_152_168_170_171)
: 51-250(_r_11_151_152_168_170_172)
: 251-1000(_r_12_151_152_168_170_172)
: 1001-10000(_r_1_151_152_168_170_173)
: 10001-100000(_r_1_151_152_168_170_174)
: 100001-(_r_1_151_152_168_170_175)
: Test set(_r_1_151_153)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.