[英]How to format the TSV file in Druid
我正在嘗試使用此攝取斑點加載德魯伊的TSV:
最新版本如下:
{
"type" : "index",
"spec" : {
"ioConfig" : {
"type" : "index",
"inputSpec" : {
"type": "local",
"baseDir": "quickstart",
"filter": "test_data.json"
}
},
"dataSchema" : {
"dataSource" : "local",
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "hour",
"queryGranularity" : "none",
"intervals" : ["2016-07-18/2016-07-22"]
},
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : ["name", "email", "age"]
},
"timestampSpec" : {
"format" : "yyyy-MM-dd HH:mm:ss",
"column" : "date"
}
}
},
"metricsSpec" : [
{
"name" : "count",
"type" : "count"
},
{
"type" : "doubleSum",
"name" : "age",
"fieldName" : "age"
}
]
}
}
}
如果我的架構如下所示:
Schema: name email age
實際數據集如下所示:
name email age Bob Jones 23 Billy Jones 45
這是如何在TSV的上述數據集中格式化列^^? 與name email age
一樣, name email age
應該是第一個(列),然后是實際數據。 我很困惑德魯伊將如何知道如何將列映射到TSV格式的實際數據集。
TSV代表制表符分隔格式,因此它看起來與csv相同,但您將使用制表符而不是逗號,例如
Name<TAB>Age<TAB>Address
Paul<TAB>23<TAB>1115 W Franklin
Bessy the Cow<TAB>5<TAB>Big Farm Way
Zeke<TAB>45<TAB>W Main St
您將使用frist line作為標題來定義列名稱 - 因此您可以在spec文件的維度中使用“name”,“age”或“email”
至於gmt和utc,它們基本相同
格林威治標准時間和協調世界時沒有時間差
第一個是時區,另一個是時間標准
順便說一句,忘記在你的tsv文件中包含一些具有時間價值的列!!
所以,例如,如果你有tsv文件看起來像:
"name" "position" "office" "age" "start_date" "salary"
"Airi Satou" "Accountant" "Tokyo" "33" "2016-07-16T19:20:30+01:00" "162700"
"Angelica Ramos" "Chief Executive Officer (CEO)" "London" "47" "2016-07-16T19:20:30+01:00" "1200000"
您的spec文件應如下所示:
{
"spec" : {
"ioConfig" : {
"inputSpec" : {
"type": "local",
"baseDir": "path_to_folder",
"filter": "name_of_the_file(s)"
}
},
"dataSchema" : {
"dataSource" : "local",
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "hour",
"queryGranularity" : "none",
"intervals" : ["2016-07-01/2016-07-28"]
},
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "tsv",
"dimensionsSpec" : {
"dimensions" : [
"position",
"age",
"office"
]
},
"timestampSpec" : {
"format" : "auto",
"column" : "start_date"
}
}
},
"metricsSpec" : [
{
"name" : "count",
"type" : "count"
},
{
"name" : "sum_sallary",
"type" : "longSum",
"fieldName" : "salary"
}
]
}
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.