Pentaho勺子從Excel文件轉換

Question

我的excel文件中有以下格式的年度數據：

Country \ Years   1980   1981   ...   2010
Abkhazia           234    334   ...    456
Afghanistan        466    789   ...    732
...

這是圖片

我希望將數據轉換為3個不同的表並將其加載到postgres數據庫中。

桌子應該看起來像這樣

第一張桌子-國家：

id | name
1  | Abkhazia
2  | Afghanistan

第二個表格的日期：

id | date
1  | 1980
2  | 1981

第三個是根據國家和日期存儲所有數據的表：

country_id    date_id   data
         1          1    234
         1          2    334
         2          1    466
         2          2    789
       ...        ...    ...

有什么想法可以實現我的目標嗎？

Answer 1

假設源excel結構如下（我已對此進行了自定義構建） ：

您的問題基本上包括三個部分。 我將轉換分解為更好的理解：

1.加載表-國家

根據excel中提供的數據，這非常簡單。 只需采取

Excel Input >> Add a sequence step. Give the Sequence name as Country ID >> Select only the Country Name and Country ID >> Load into the Country Table using Table Output Excel Input >> Add a sequence step. Give the Sequence name as Country ID >> Select only the Country Name and Country ID >> Load into the Country Table using Table Output 。

2.加載表-年：

這里的想法是以行格式顯示Year ID，而不是給定excel源數據的列。 PDI版本5及更高版本為您提供了一個非常有用的步驟，稱為元數據結構。 此步驟允許您獲取表的結構。 在這種情況下，我們需要刪除年份列，而忽略國家列。

請按照以下步驟操作：

Read the Excel Data >> Get the Metadata structure of your source >> Filter Out the Country Column (which is available in row at position=1) >> Add a Sequence Number. Name it YearID >> Finally Load the Year Table.

3.載入決賽桌-國家和年份以及數據：

在PDI中將所有列數據值顯示為行級別的方法是使用“ 行歸一化”步驟。 使用此步驟顯示標准化輸出。 現在，請按照以下步驟操作：

Read the Excel source data >> use Row Normalizer Step to normalize the rows based on the Years >> Do a Stream Lookup with the Above Country and Year tables to fetch the CountryID and YearID respectively >> Finally Load the necessary column data into Table Output

希望能幫助到你：）

我已經將代碼與我使用過的數據文件一起放置在github repo中。 它在這里。

另外，只要意識到我根據您的問題給出了錯誤的命名約定。 考慮date_id作為YearID，而不是ID，我提供了countryid和yearid。

Pentaho勺子從Excel文件轉換

問題描述

1 個解決方案

解決方案1
1 已采納 2015-10-15 15:41:05

Pentaho勺子從Excel文件轉換

問題描述

1 個解決方案

解決方案1 1 已采納 2015-10-15 15:41:05

解決方案1
1 已采納 2015-10-15 15:41:05