簡體   English   中英

使用Java根據關鍵字解析文本

[英]Using Java to parse text based on keywords

基本上,我得到一個文件,其中包含有關人員的詳細信息,每個人都用新行分隔,例如“

name Marioka address 97 Garderners Road birthday 12-11-1982 \n
name Ada Lovelace gender woman\n
name James address 65 Watcher Avenue

“ 等等..

並且,我想將它們解析為[Keyword:Value]對數組,例如

{[Name, Marioka], [Address, 97 Gardeners Road], [Birthday, 12-11-1982]},
{[Name, Ada Lovelace], [Gender, Woman]}, and so on....

等等。 關鍵字將是一組定義的單詞,在上面的例子中:姓名,地址,生日,性別等......

做這個的最好方式是什么?

這就是我做到的,它有效,但想知道是否有更好的解決方案。

    private Map<String, String> readRecord(String record) {
        Map<String, String> attributeValuePairs = new HashMap<String, String>();
        Scanner scanner = new Scanner(record);
        String attribute = "", value = ""; 

        /* 
         * 1. Scan each word. 
         * 2. Find an attribute keyword and store it at "attribute".
         * 3. Following words will be stored as "value" until the next keyword is found.
         * 4. Return value-attribute pairs as HashMap
         */

        while(scanner.hasNext()) {
            String word = scanner.next();
            if (this.isAttribute(word)) {
                if (value.trim() != "") {
                    attributeValuePairs.put(attribute.trim(), value.trim());
                    value = "";
                }
                attribute = word;
            } else {
                value += word + " ";
            }
        }
        if (value.trim() != "") attributeValuePairs.put(attribute, value);

        scanner.close();
        return attributeValuePairs;
    }

    private boolean isAttribute(String word) {
        String[] attributes = {"name", "patientId", 
            "birthday", "phone", "email", "medicalHistory", "address"};
        for (String attribute: attributes) {
            if (word.equalsIgnoreCase(attribute)) return true;
        }
        return false;
    }

要從字符串中提取值,請使用正則表達式。 我希望您知道如何從文件中讀取每一行以及如何使用結果構建數組。

這仍然不是一個好的解決方案,因為如果名稱或地址中包含任何關鍵字,它就不起作用......但這就是你要求的......

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {

    public static void main(String[] args) {

        Pattern p = Pattern.compile("name (.+) address (.+) birthday (.+)");

        String text = "name Marioka address 97 Garderners Road birthday 12-11-1982";

        Matcher m = p.matcher(text);

        if (m.matches()) {
            System.out.println(m.group(1) + "\n" + m.group(2) + "\n"
                    + m.group(3));
        } else {
            System.out.println("String does not match");
        }
    }
}

嘗試這個:

ArrayList<String> keywords = new ArrayList<String>();
    keywords.add("name");
    keywords.add("address");
    keywords.add("birthday");
    keywords.add("gender");
    String s[] = "name James address 65 Watcher Avenue".trim().split(" ");
    Map<String,String> m = new HashMap<String,String>();
    for(int i=0;i<s.length;i++){

        if(keywords.contains(s[i])){
            System.out.println(s[i]);

            String key =s[i];
            StringBuilder b = new StringBuilder();
            i++;
            if(i<s.length){
            while(!(keywords.contains(s[i]))){

                System.out.println("i "+i);
                if(i<s.length-1){
                b.append(s[i] + " ");
                }
                i++;
                if(i>=s.length){
                    b.append(s[i-1]);
                    break;
                }
            }
            }
            m.put(key, b.toString());
            i--;
        }
    }
    System.out.println(m);

只需將您要識別的關鍵字添加到名為keywords的arraylist中,它就可以使用。

編輯:請注意,如果有人的名字或地址包含其中一個關鍵字,它將不會生成輸出“

最好的方法是將數據放入地圖中,這樣就可以設置鍵值(“名稱”:“Marioka”)

Map<String,String> mp=new HashMap<String, String>();
    // adding or set elements in Map by put method key and value pair
    mp.put("name", "nameData");
    mp.put("address", "addressData")...etc

這需要你(偽代碼):

1.  >Read a line
2.  >Split it by a delimiter(' ' in your case)
2.5 >Map<String,String> mp = new HashMap<String,String>();
3.  >for(int i = 0; i < splitArray.length; i += 2){
      try{
        mp.put(splitArray[i],splitArray[i+1]);
      }catch(Exception e){ System.err.println("Syntax Error"); }
4.  >Bob's your uncle, Fanny's your aunt. 

雖然您必須修改數據文件以說';' =空間。

name Ada;Lovelace

逐行讀取文件並在每一行上調用getKeywordValuePairs()方法。

public class S{

    public static void main(String[] args) {
        System.out.println(getKeywordValuePairs("name Marioka address 97 Garderners Road birthday 12-11-1982",
                new String[]{
                    "name", "address", "birthday", "gghghhjgghjhj"
                }));
    }

    public static String getKeywordValuePairs(String text, String keywords[]) {

        ArrayList<String> keyWordsPresent = new ArrayList<>();
        ArrayList<Integer> indicesOfKeywordsPresent = new ArrayList<>();

        // finding the indices of all the keywords and adding them to the array
        // lists only if the keyword is present
        for (int i = 0; i < keywords.length; i++) {
            int index = text.indexOf(keywords[i]);
            if (index >= 0) {
                keyWordsPresent.add(keywords[i]);
                indicesOfKeywordsPresent.add(index);
            }
        }

        // Creating arrays from Array Lists
        String[] keywordsArray = new String[keyWordsPresent.size()];
        int[] indicesArray = new int[indicesOfKeywordsPresent.size()];
        for (int i = 0; i < keywordsArray.length; i++) {
            keywordsArray[i] = keyWordsPresent.get(i);
            indicesArray[i] = indicesOfKeywordsPresent.get(i);
        }


        // Sorting the keywords and indices arrays based on the position where the keyword appears
        for (int i = 0; i < indicesArray.length; i++) {
            for (int j = 0; j < indicesArray.length - 1 - i; j++) {
                if (indicesArray[i] > indicesArray[i + 1]) {
                    int temp = indicesArray[i];
                    indicesArray[i] = indicesArray[i + 1];
                    indicesArray[i + 1] = temp;
                    String tempString = keywordsArray[i];
                    keywordsArray[i] = keywordsArray[i + 1];
                    keywordsArray[i + 1] = tempString;
                }
            }
        }

        // Creating the result String
        String result = "{";
        for (int i = 0; i < keywordsArray.length; i++) {
            result = result + "[" + keywordsArray[i] + ",";
            if (i == keywordsArray.length - 1) {
                result = result + text.substring(indicesArray[i] + keywordsArray[i].length()) + "]";
            } else {
                result = result + text.substring(indicesArray[i] + keywordsArray[i].length(), indicesArray[i + 1]) + "],";
            }
        }
        result = result + "}";
        return result;
    }
}

我有一個完全不同的解決方案,探索Java regular expressions and Enum強大功能, Java regular expressions and Enum讀取並解析為pojo,這是未來的解決方案。

步驟-1:定義您的枚舉(您可以擴展枚舉以添加所有必需的鍵)

public enum PersonEnum {
  name { public void set(Person d,String name) {  d.setName(name) ;} },
  address { public void set(Person d,String address) {  d.setAddress(address); } },
  gender { public void set(Person d,String address) {  d.setOthers(address); } };
  public void set(Person d,String others) { d.setOthers(others);  }
}

第2步:定義你的pojo類(如果你不需要pojo,你可以改變枚舉來使用HashMap

public class Person {

    private String name;
    private String address;
    private String others;

    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public String getAddress() {
        return address;
    }
    public void setAddress(String address) {
        this.address = address;
    }
    public String getOthers() {
        return others;
    }
    public void setOthers(String others) {
        this.others = others;
    }
    @Override
    public String toString() {
        return name+"==>"+address+"==>"+others;
    }

第2步:這是解析器

public static void main(String[] args) {

    try {
        String inputs ="name Marioka address 97 Garderners Road birthday 12-11-1982\n name Ada Lovelace gender" +
                " woman address London\n name James address 65 Watcher Avenue";
        Scanner scanner = new Scanner(inputs);
        List<Person> personList = new ArrayList<Person>();
        while(scanner.hasNextLine()){
            String line = scanner.nextLine();
            List<String> filtereList=splitLines(line, "name|address|gender");
            Iterator< String> lineIterator  = filtereList.iterator();
            Person p = new Person();
            while(lineIterator.hasNext()){
                PersonEnum pEnum = PersonEnum.valueOf(lineIterator.next());
                pEnum.set(p, lineIterator.next());
            }
            personList.add(p);
            System.out.println(p);
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}
public static List<String> splitLines(String inputText, String pString) {
    Pattern pattern =Pattern.compile(pString);
    Matcher m = pattern.matcher(inputText);
    List<String> filteredList = new ArrayList<String>();
    int start = 0;
    while (m.find()) {
        add(inputText.substring(start, m.start()),filteredList);
        add(m.group(),filteredList);
        start = m.end();
    }
    add(inputText.substring(start),filteredList);
    return filteredList;
}
public static void add(String text, List<String> list){
    if(text!=null && !text.trim().isEmpty()){
        list.add(text);
    }
}

注意:您需要在PersonEnum中定義可能的枚舉常量,否則您需要采取措施來防止InvalidArgumentException

eg: java.lang.IllegalArgumentException: No enum const class com.sa.PersonEnum.address

否則,這可能是最好的java(OOP)解決方案之一,我可以建議干杯!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM