简体   繁体   中英

java - replaceAll anything in between known boundaries including newlines and multiple tabs

I want to replace the highlighted sections in the first figure 1 (starting with :g anything in between including newline and multiple tabs, and ending with : ) with the highlighted sections in the second figure 2 ( :o ).

Before: 第一个数字

After: 第二个数字 I tried replaceAll(":g.*?:", ":o") but didn't work due to newline and tabs, then I tried something like replaceAll(":g.*?]\\n\\t\\t\\t:", ":o") but the number of tabs can vary so I need something to replace all tabs in this highlighted section.

 try { File fXmlFile = new File("/Users/eddy/1.xml"); DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder dBuilder = dbFactory.newDocumentBuilder(); Document doc = dBuilder.parse(fXmlFile); doc.getDocumentElement().normalize(); System.out.println(doc.getElementsByTagName("feature_tree").item(0).getTextContent()); String fm = doc.getElementsByTagName("feature_tree").item(0).getTextContent(); fm = fm.replaceAll(":g[^:]*:", ":o"); System.out.println(fm); } catch (Exception e) { e.printStackTrace(); } :r DataMining(_r) :m InputData(_r_1) :m Attribute types(_r_1_35) :o Mixed attribute types(_r_1_141_151_157_158) :g _r_1_35_36(_r_1_35_36) [1,*] : Discrete(_r_1_35_36_39) :g _r_1_35_36_39_41(_r_1_35_36_39_41) [1,*] : Nominal(_r_1_35_36_39_41_43) : Ordinal(_r_1_35_36_39_41_44) : Numerical(_r_1_35_36_40) :g _r_1_35_36_40_45(_r_1_35_36_40_45) [1,*] : Bounded(_r_1_35_36_40_45_46) : Unbounded(_r_1_35_36_40_45_47) : Text(_r_1_35_36_55) : Time(_r_1_35_36_58) :m Data properties(_r_1_141) :o Labeled data(_r_1_141_144) :o More than two classes(_r_1_141_144_154) :o Missing values present(_r_1_141_145) :o Independant attributes(_r_1_141_150) :o Standardized values(_r_1_141_155) :m Data Sets(_r_1_151) :g _r_11_35_36_40_45(_r_11_35_36_40_45) [1,*] : Training set(_r_1_151_152) :m Number of instances(_r_1_151_152_168) :g _r_1_151_152_168_170(_r_1_151_152_168_170) [1,1] : 1-50(_r_1_151_152_168_170_171) : 51-250(_r_11_151_152_168_170_172) : 251-1000(_r_12_151_152_168_170_172) : 1001-10000(_r_1_151_152_168_170_173) : 10001-100000(_r_1_151_152_168_170_174) : 100001-(_r_1_151_152_168_170_175) : Test set(_r_1_151_153) 

I don't think a regex is the proper solution, but if your requirements never change replaceAll(":g[^:]*:",":o") should do the trick.

Test :

public static void main(String t[]) throws IOException {
    Path exampleFile = Paths.get(PATH_TO_YOUR_EXAMPLE_AS_TEXT);
    String dataAsString = new String(Files.readAllBytes(exampleFile));
    System.out.println(dataAsString);
    System.out.println(dataAsString.replaceAll(":g[^:]*:", ":o"));

}

Prints :

:r DataMining(_r)
:m InputData(_r_1)
    :m Attribute types(_r_1_35)
        :o Mixed attribute types(_r_1_141_151_157_158)
        :g _r_1_35_36(_r_1_35_36) [1,*] 
            : Discrete(_r_1_35_36_39)
                :g _r_1_35_36_39_41(_r_1_35_36_39_41) [1,*] 
                    : Nominal(_r_1_35_36_39_41_43)
                    : Ordinal(_r_1_35_36_39_41_44)
            : Numerical(_r_1_35_36_40)
                :g _r_1_35_36_40_45(_r_1_35_36_40_45) [1,*] 
                    : Bounded(_r_1_35_36_40_45_46)
                    : Unbounded(_r_1_35_36_40_45_47)
            : Text(_r_1_35_36_55)
            : Time(_r_1_35_36_58)
    :m Data properties(_r_1_141)
        :o Labeled data(_r_1_141_144)
            :o More than two classes(_r_1_141_144_154)
        :o Missing values present(_r_1_141_145)
        :o Independant attributes(_r_1_141_150)
        :o Standardized values(_r_1_141_155)
    :m Data Sets(_r_1_151)
        :g _r_11_35_36_40_45(_r_11_35_36_40_45) [1,*] 
            : Training set(_r_1_151_152)
                :m Number of instances(_r_1_151_152_168)
                    :g _r_1_151_152_168_170(_r_1_151_152_168_170) [1,1] 
                        : 1-50(_r_1_151_152_168_170_171)
                        : 51-250(_r_11_151_152_168_170_172)
                        : 251-1000(_r_12_151_152_168_170_172)
                        : 1001-10000(_r_1_151_152_168_170_173)
                        : 10001-100000(_r_1_151_152_168_170_174)
                        : 100001-(_r_1_151_152_168_170_175)
            : Test set(_r_1_151_153)
:r DataMining(_r)
:m InputData(_r_1)
    :m Attribute types(_r_1_35)
        :o Mixed attribute types(_r_1_141_151_157_158)
        :o Discrete(_r_1_35_36_39)
                :o Nominal(_r_1_35_36_39_41_43)
                    : Ordinal(_r_1_35_36_39_41_44)
            : Numerical(_r_1_35_36_40)
                :o Bounded(_r_1_35_36_40_45_46)
                    : Unbounded(_r_1_35_36_40_45_47)
            : Text(_r_1_35_36_55)
            : Time(_r_1_35_36_58)
    :m Data properties(_r_1_141)
        :o Labeled data(_r_1_141_144)
            :o More than two classes(_r_1_141_144_154)
        :o Missing values present(_r_1_141_145)
        :o Independant attributes(_r_1_141_150)
        :o Standardized values(_r_1_141_155)
    :m Data Sets(_r_1_151)
        :o Training set(_r_1_151_152)
                :m Number of instances(_r_1_151_152_168)
                    :o 1-50(_r_1_151_152_168_170_171)
                        : 51-250(_r_11_151_152_168_170_172)
                        : 251-1000(_r_12_151_152_168_170_172)
                        : 1001-10000(_r_1_151_152_168_170_173)
                        : 10001-100000(_r_1_151_152_168_170_174)
                        : 100001-(_r_1_151_152_168_170_175)
            : Test set(_r_1_151_153)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM