简体   繁体   中英

R with tcltk/tcltk2: Improve slow performance when displaying big data.frame with TkTable?

Please see two edits below (added later)...

I have loaded a big data.frame into memory ( 2.7 mio rows and 7 columns - 74 MB of RAM ).

If I want to view the data using Tcl/Tk's Tktable widget via the tcltk2 package function tk2edit

  • it takes over 15 minutes till the window is displayed with the data
  • and about 7 GB of RAM (!) is consumed by R (incl. Tcl/Tk) en plus!

Example:

library(tcltk2)

my.data.frame <- data.frame(ID=1:2600000,
                            col1=rep(LETTERS,100000),
                            col2=rep(letters,1E5),
                            col3=26E5:1)       # about 40 MB of data

tk2edit(my.data.frame)

The basic problem seems to be that each cell of the data.frame must loaded into an tcl array via two nested loops ( see the code in this tktable question ).

The tcltk2 package's function tk2edit works the same way, over-simplified:

# my.data.frame contains a lot of rows...
for (i in 0:(dim(my.data.frame)[1])) {
        for (j in 0:(dim(my.data.frame)[2]-1)) {
                tclarray1[[i,j]] <- my.data.frame[i, j]
        }
}

Question: Is there any way to optimize displaying big data.frames with tktable, eg by avoiding the nested loops? I just want to view data (no editing required)...

tktable has the -variable option where you can set the tcl array variable that contains ALL the data of the table. So we "only" have to find way to create a tcl array from an R data.frame with "one call to tcl from R"...

PS: This is not a problem of the tcltk2 package but seems to be a general problem how to "bulk load" data of a data.frame into Tcl variables...

PS2: The good thing is that Tktable seems to be able to display such a lot of data efficiently (I can scroll and even edit cells without noticing any severe delays).


Edit 1 (09/01/2015): Adding pure Tcl/Tk benchmark results with Tktable and data in an array

I have prepared a simple benchmark in Tcl/Tk to measure the execution time and memory consumption of filling a similar Tktable :

#!/usr/bin/env wish

package require Tktable

set rows 2700000
set columns 4

for {set row 0} {$row <= $rows} {incr row} {
  for {set column 0} {$column < $columns} {incr column} {
    if {$row == 0} {
      set data($row,$column) Titel$column
    } else {
      set data($row,$column) R${row}C${column}
    }
  }
}

ttk::frame .fr

table .fr.table -rows $rows -cols $columns -titlerows 1 -titlecols 0 -height 5 -width 25 -rowheight 1 -colwidth 9 -maxheight 100 -maxwidth 400 -selectmode extended -variable data -xscrollcommand {.fr.xscroll set} -yscrollcommand {.fr.yscroll set}

scrollbar .fr.xscroll -command {.fr.table xview} -orient horizontal
scrollbar .fr.yscroll -command {.fr.table yview}

pack .fr -fill both -expand 1
pack .fr.xscroll -side bottom -fill x
pack .fr.yscroll -side right -fill y
pack .fr.table -side right -fill both -expand 1

Results:

  • Memory consumption: 3.2 GB
  • Time until the table is displayed: 15 sec.

Conclusion: Tcl/Tk arrays are wasting memory, but the performance is very good (the runtime of 15 minutes when using R with tcltk seem to be caused by R to Tcl/Tk communication overhead.

Test setup: Ubuntu 14.04 64 Bit with 16 GB RAM...


Edit 2 (10/01/2015): Adding pure Tcl/Tk benchmark results of ttk::treeview with data in a list

To compare the memory consumption of Tktable to ttk::treeview I wrote this code:

#!/usr/bin/env wish
set rows 2700000
set columns 4
set data {}
set colnames {}
for {set i 0} {$i < $columns} {incr i} {
  lappend colnames Title$i
}
for {set row 0} {$row <= $rows} {incr row} {
  set newrow {}
  for {set column 0} {$column < $columns} {incr column} {
      lappend newrow R${row}C${column}
  }
  lappend data $newrow
}

ttk::treeview .tv -columns $colnames -show headings -yscrollcommand {.sbY set} -xscrollcommand {.sbX set}
foreach Element $data {
   .tv insert {} end -values $Element
}
foreach column $colnames {
  .tv heading $column -text $column
}
ttk::scrollbar .sbY -command {.tv yview}
ttk::scrollbar .sbX -command {.tv xview} -orient horizontal
pack .sbY -side right -fill y
pack .sbX -side bottom -fill x
pack .tv -side left -fill both

Results:

  • Memory consumption: 2 GB (thereof data stored as list: 1.2 GB)
  • Time until the table is displayed: 15 sec.
  • Compare: 10 mio rows consume 7.2 GB of RAM but selecting a row takes serveral seconds (2 - 5) then (possible reason: Internal list traversal?)

Conclusion:

  • The treeview is more memory efficient than Tktable since it can use a list instead of an array.
  • For bigger data sizes (> a few million rows) the row selection is slow (the more at the end the slower!)

I have found one possible solution/workaround using Tktable in an "unbound" (command) mode .

With the command option of Tktable you can specify a function that is called each time a cell shall be displayed on the screen. This avoids "loading" all the data from R to Tcl at once improving the "start-up" time and significantly reduces the memory consumption caused by TCL's way of storing arrays and lists.

This way every time you scroll a series of function calls are done to ask for the content of the visible cells.

It works for me even with over 10 mio. rows!

Drawback: Calling an R function that returns a Tcl variable for each cell is still far from being efficient. If you scroll for the first time you can watch the cells being updated. Therefore I am still looking for a bulk data transfer solution between R and Tcl/Tk.

Any suggestions to improve the performance are welcome!

I have implemented a small demo (with 1 mio. rows and 21 columns consuming 1.2 GB of RAM) and added some buttons to test different features (like caching).

Note: The long start-up time is caused by creating the underlying test data, NOT by Tktable!

library(tcltk)
library(data.table)

# Tktable example with -command ("unbound" mode) ---------------------------
# Doc: http://tktable.sourceforge.net/tktable/doc/tkTable.html

NUM.ROWS <- 1E6
NUM.COLS <- 20

# generate a big data.frame - this will take a while but is required for the demo
dt.data <- data.table(ID = 1:NUM.ROWS)

for (i in 1:NUM.COLS) {
  dt.data[, (paste("Col",i)) := paste0("R", 1:NUM.ROWS, " C", i)]
}

# Fill one cell with a long text containing special control characters to test the Tktable behaviour
dt.data[3,3 := "This is a long text with backslash \\ and \"quotes\"!"]

tclRequire("Tktable")

t <- tktoplevel()

tkwm.protocol(t, "WM_DELETE_WINDOW", function() tkdestroy(t))

# Function to return the current row and column as "calculated" value (without an underlying data "model")
calculated.data <- function(C) {
  # Function arguments  for Tcl "substitutions":
  # See:   http://tktable.sourceforge.net/tktable/doc/tkTable.html
  #   %c the column of the triggered cell.
  #   %C A convenience substitution for %r,%c.
  #   %i 0 for a read (get) and 1 for a write (set). Otherwise it is the current cursor position in the cell.
  #   %r the row of the triggered cell.
  return(tclVar(C))  # this does work!
}

# Function to return the content of a data.table for the current row and colum
data.frame.data <- function(r, c) {
  if( r == "0")
    return(tclVar(names(dt.data)[as.integer(c)+1]))             # First row contains the column names
  else
    return(tclVar(as.character(dt.data[as.integer(r)+1, as.integer(c)+1, with = FALSE])))   # Other rows are data rows
}

frame <- ttklabelframe(t, text = "Data:")
# Add the table to the window environment to ensure killing it when the window is closed (= no more phantom calls to the data command handler)
# Cache = TRUE: This greatly enhances speed performance when used with -command but uses extra memory.
t$env$table <- tkwidget(frame, "table", rows = NUM.ROWS, cols = NUM.COLS, titlerows = 1, selecttype = "cell", selectmode = "extended", command = calculated.data, cache = TRUE, yscrollcommand = function(...) tkset(scroll.y, ...), xscrollcommand = function(...) tkset(scroll.x, ...))

scroll.x <- ttkscrollbar(frame, orient = "horizontal", command=function(...) tkxview(t$env$table,...))  # command that performs the scrolling
scroll.y <- ttkscrollbar(frame, orient = "vertical", command=function(...) tkyview(t$env$table,...))  # command that performs the scrolling

buttons <- ttkframe(t)
btn.read.only <- ttkbutton(buttons, text = "make read only", command = function() tkconfigure(t$env$table, state = "disabled"))
btn.read.write <- ttkbutton(buttons, text = "make writable", command = function() tkconfigure(t$env$table, state = "normal"))
btn.clear.cache <- ttkbutton(buttons, text = "clear cache", command = function() tcl(t$env$table, "clear", "cache"))
btn.bind.data.frame <- ttkbutton(buttons, text = "Fill cells from R data.table",
                                 command = function() {
                                   tkconfigure(t$env$table, command = data.frame.data, rows = nrow(dt.data), cols = ncol(dt.data), titlerows = 1)
                                   tcl(t$env$table, "clear", "cache")
                                   tkwm.title(t,"Cells are filled from an R data.table")
                                 })
btn.bind.calc.value <- ttkbutton(buttons, text = "Fill cells with calculated values",
                                 command = function() {
                                   tkconfigure(t$env$table, command = calculated.data, rows = 1E5, cols = 40)
                                   tcl(t$env$table, "clear", "cache")
                                   tkwm.title(t,"Cells are calculated values (to test the highest performance possible)")
                                 })

tkgrid(btn.read.only, row = 0, column = 1)
tkgrid(btn.read.write, row = 0, column = 2)
tkgrid(btn.clear.cache, row = 0, column = 3)
tkgrid(btn.bind.data.frame, row = 0, column = 5)
tkgrid(btn.bind.calc.value, row = 0, column = 6)

tkpack(frame, fill = "both", expand = TRUE)
tkpack(scroll.x, fill = "x", expand = FALSE, side = "bottom")
tkpack(scroll.y, fill = "y", expand = FALSE, side = "right")
tkpack(t$env$table, fill = "both", expand = TRUE, side = "left")
tkpack(buttons, side = "bottom")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM