簡體   English   中英

將數據字符串拆分為多個變量(Java)

[英]Splitting up a data String into multiple variables (Java)

我正在處理約10,500行的龐大數據集,這些數據集需要拆分成各個部分,包括標題,日期,等級和長度。 數據的格式如下: Ghost Blues: The Story of Rory Gallagher (2010) | 3.8 stars, 1hr 21m Ghost Blues: The Story of Rory Gallagher (2010) | 3.8 stars, 1hr 21m

我已經想出了如何使用.split將數據分成兩半,但是我不確定在標題中是否帶有括號的情況下如何將標題的前半部分和后半部分拆分為標題和日期,例如Dhobi Ghat (Mumbai Diaries) (2010) | 3.6 stars, 1hr 42mDhobi Ghat (Mumbai Diaries) (2010) | 3.6 stars, 1hr 42m Dhobi Ghat (Mumbai Diaries) (2010) | 3.6 stars, 1hr 42m

在某些情況下,其中某些字段可能為空,因此沒有等級,日期或長度,這也給我帶來了一些問題。 誰能指出我正確的方向? 任何幫助,將不勝感激!

編輯:所以我忘了提及(抱歉),我需要任何日期和等級作為整數,因為以后我將需要能夠應用過濾器,例如搜索等級> 3.5的所有條目或1998年以后的電影,諸如此類。 這給我仍在使用的工具帶來了麻煩。 謝謝您到目前為止提供的所有幫助!

嘗試一下,測試一些邊緣情況,如注釋所示:-

public static void main(String[] args) {
    String s = "Ghost Blues: The Story of Rory Gallagher (2010) |   3.8 stars, 1hr 21m";
    //String s = "Ghost Blues: The Story of Rory Gallagher |   3.8 stars, 1hr 21m"; //no year
    //String s = "Ghost Blues: The Story of Rory Gallagher (2010) |   3.8 stars"; //no length
    Pattern p = Pattern.compile("(.*?)( (\\((\\d{4})\\)))? \\|\\s+(\\d(\\.\\d)?) stars(, (\\dhr( \\d{1,2}m)?))?");
    Matcher m = p.matcher(s);
    if (m.find()) {
        System.out.println(m.group(1)); //title
        System.out.println(m.group(4)); //year
        System.out.println(m.group(5)); //rating
        System.out.println(m.group(8)); //length
    }
}

產量

Ghost Blues: The Story of Rory Gallagher
2010
3.8
1hr 21m

如果您可以提供邊緣情況的示例,則可以進一步改進。

這是一個解決方案:

public class Title {
    private String title;
    private String year;
    private String rating;
    private String length;
    public Title(String input) {
        String[] leftRight = input.split("\\|");
        title = leftRight[0].trim();
        int lastParen = title.lastIndexOf("(");
        if (lastParen > 0) {
            year = title.substring(lastParen+1);
            title = title.substring(0, lastParen).trim();
        }
        if (leftRight.length>1) {
            String[] fields = leftRight[1].split(",");
            for (int i = 0; i < fields.length; i++) {
                if (fields[i].contains("stars")) {
                    rating = fields[i].trim();
                } else {
                    length = fields[i].trim();
                }
            }
        }
    }
    @Override
    public String toString() {
        return "Title{" + "title=" + title + ", year=" + year + ", rating=" + rating + ", length=" + length + '}';
    }

    public static void main(String[] args) {
        String[] data = {
            "Ghost Blues: The Story of Rory Gallagher (2010) |   3.8 stars, 1hr 21m",
            "Dhobi Ghat (Mumbai Diaries) (2010) |   3.6 stars, 1hr 42m",
            "just a title",
            "title and rating only | 3.2 stars",
            "title and length only | 1hr 30m"
        };
        for (String titleString : data) {
            Title t = new Title(titleString);
            System.out.println(t);
        }
    }
}

這是測試數據的輸出:

Title{title=Ghost Blues: The Story of Rory Gallagher, year=2010), rating=3.8 stars, length=1hr 21m}
Title{title=Dhobi Ghat (Mumbai Diaries), year=2010), rating=3.6 stars, length=1hr 42m}
Title{title=just a title, year=null, rating=null, length=null}
Title{title=title and rating only, year=null, rating=3.2 stars, length=null}
Title{title=title and length only, year=null, rating=null, length=1hr 30m}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM