简体   繁体   中英

Apache POI - Read and store Rich Text content in DB

We have a new requirement in our java application where user's would upload an excel file. One of the column in the excel file will be formatted with bold, italics, bullet points, colored text etc.

We need to read this excel file and store these values in Oracle DB table. Also subsequently we need to extract these data and download into excel sheet with the formatting preserved.

We planned to use the Apache-poi for the same, but now stuck at the point where we have the HSSFRichTextString object that needs to be converted into a format to store into Oracle table.

The tostring() method of HSSFRichTextString gives the string but the formatting is lost. Can someone please suggest me how to convert this HSSFRichTextString object into Oracle data type (preferably clob).

You are right in that the toString() method will just return the unformatted String contents of the HSSFRichTextString .

Here is a method of extracting out all the other important data from the HSSFRichTextString to be stored with the string value.

Very similar to my answer to this question , extract the rich text formatting information from the HSSFRichTextString , and store that data in a class you'll create, FormattingRun .

public class FormattingRun {
    private int beginIdx;
    private int length;
    private short fontIdx;
    public FormattingRun(int beginIdx, int length, short fontIdx) {
        this.beginIdx = beginIdx;
        this.length = length;
        this.fontIdx = fontIdx;
    }
    public int getBegin() { return beginIdx; }
    public int getLength() { return length; }
    public short getFontIndex { return fontIdx; }
}

Then, call Apache POI methods to extract that data.

Now, the actual extraction of the data:

List<FormattingRun> formattingRuns = new ArrayList<FormattingRun>();
int numFormattingRuns = richTextString.numFormattingRuns();
for (int fmtIdx = 0; fmtIdx < numFormattingRuns; fmtIdx)
{
    int begin = richTextString.getIndexOfFormattingRun(fmtIdx);
    short fontIndex = richTextString.getFontOfFormattingRun(fmtIdx);

    // Walk the string to determine the length of the formatting run.
    int length = 0;
    for (int j = begin; j < richTextString.length(); j++)
    {
        short currFontIndex = richTextString.getFontAtIndex(j);
        if (currFontIndex == fontIndex)
            length++;
        else
            break;
    }
    formattingRuns.add(new FormattingRun(begin, length, fontIndex));
}

To store this data in the database, first recognize that there is a one-to-many relationship between a HSSFRichTextString and FormattingRun . So in whatever Oracle table you're planning on storing the rich text string data, you will need to create a foreign key relationship to another new table that stores the formatting run data. Something like this:

Table: rich_text_string
rts_id     NUMBER
contents   VARCHAR2(4000)

with rts_id being the primary key, and:

Table: rts_formatting_runs
rts_id     NUMBER
run_id     NUMBER
run_pos    NUMBER
run_len    NUMBER
font_index NUMBER

with (rts_id, run_id) being the primary key, and rts_id referring back to the rich_text_string table.

Using your favorite Java-to-database framework (JDBC, Hibernate, etc.), store the String value into contents in rich_text_string , and the associated FormattingRun object data into rt_formatting_runs .

Just be careful - the font index is only valid within the workbook. You'll need to store the font information from the HSSFWorkbook also, to give the font_index meaning.

It's not stored as a CLOB , but the data are arguably more meaningful stored this way.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM