[英]R with tcltk/tcltk2: Improve slow performance when displaying big data.frame with TkTable?
請在下面看到兩個修改(稍后添加)...
我已經將一個大的data.frame加載到內存中( 2.7 mio行和7列-74 MB RAM )。
如果我想通過tcltk2包函數tk2edit
使用Tcl / Tk的Tktable小部件查看數據
例:
library(tcltk2)
my.data.frame <- data.frame(ID=1:2600000,
col1=rep(LETTERS,100000),
col2=rep(letters,1E5),
col3=26E5:1) # about 40 MB of data
tk2edit(my.data.frame)
基本問題似乎是data.frame的每個單元必須通過兩個嵌套循環加載到tcl數組中( 請參閱此tktable問題中的代碼 )。
tcltk2軟件包的功能tk2edit
以相同的方式工作,但過於簡化:
# my.data.frame contains a lot of rows...
for (i in 0:(dim(my.data.frame)[1])) {
for (j in 0:(dim(my.data.frame)[2]-1)) {
tclarray1[[i,j]] <- my.data.frame[i, j]
}
}
問題:是否有任何方法可以通過tktable優化顯示大data.frames,例如通過避免嵌套循環? 我只想查看數據(無需編輯)...
tktable
有-variable
選項,您可以設置包含該表的所有數據TCL數組變量。 因此,我們“僅”必須找到一種方法,通過“從R調用tcl”來從R data.frame創建tcl數組...
PS:這不是tcltk2
軟件包的問題,但似乎是一個普遍的問題,如何將data.frame的數據“批量加載”到Tcl變量中...
PS2:好處是Tktable
似乎能夠有效地顯示大量數據(我可以滾動甚至編輯單元格,而不會注意到任何嚴重的延遲)。
Tktable
月1日):將純Tcl / Tk基准測試結果與Tktable
和數組中的數據Tktable
我已經在Tcl / Tk中准備了一個簡單的基准測試,以測量執行類似Tktable
時間和內存消耗:
#!/usr/bin/env wish
package require Tktable
set rows 2700000
set columns 4
for {set row 0} {$row <= $rows} {incr row} {
for {set column 0} {$column < $columns} {incr column} {
if {$row == 0} {
set data($row,$column) Titel$column
} else {
set data($row,$column) R${row}C${column}
}
}
}
ttk::frame .fr
table .fr.table -rows $rows -cols $columns -titlerows 1 -titlecols 0 -height 5 -width 25 -rowheight 1 -colwidth 9 -maxheight 100 -maxwidth 400 -selectmode extended -variable data -xscrollcommand {.fr.xscroll set} -yscrollcommand {.fr.yscroll set}
scrollbar .fr.xscroll -command {.fr.table xview} -orient horizontal
scrollbar .fr.yscroll -command {.fr.table yview}
pack .fr -fill both -expand 1
pack .fr.xscroll -side bottom -fill x
pack .fr.yscroll -side right -fill y
pack .fr.table -side right -fill both -expand 1
結果:
結論: Tcl / Tk陣列浪費內存,但是性能非常好(將R與tcltk一起使用時15分鍾的運行時間似乎是由於R與Tcl / Tk的通信開銷引起的。
測試設置:Ubuntu 14.04 64 Bit with 16 GB RAM ...
為了比較Tktable
和ttk::treeview
的內存消耗,我編寫了以下代碼:
#!/usr/bin/env wish
set rows 2700000
set columns 4
set data {}
set colnames {}
for {set i 0} {$i < $columns} {incr i} {
lappend colnames Title$i
}
for {set row 0} {$row <= $rows} {incr row} {
set newrow {}
for {set column 0} {$column < $columns} {incr column} {
lappend newrow R${row}C${column}
}
lappend data $newrow
}
ttk::treeview .tv -columns $colnames -show headings -yscrollcommand {.sbY set} -xscrollcommand {.sbX set}
foreach Element $data {
.tv insert {} end -values $Element
}
foreach column $colnames {
.tv heading $column -text $column
}
ttk::scrollbar .sbY -command {.tv yview}
ttk::scrollbar .sbX -command {.tv xview} -orient horizontal
pack .sbY -side right -fill y
pack .sbX -side bottom -fill x
pack .tv -side left -fill both
結果:
結論:
treeview
比Tktable
更具內存效率,因為它可以使用列表而不是數組。 我發現在“未綁定”(命令)模式下使用Tktable
一種可能的解決方案/解決方法。
使用Tktable
的command
選項,您可以指定每次在屏幕上顯示一個單元格時都會調用的函數。 這避免了一次將所有數據從R加載到Tcl的情況,從而縮短了“啟動”時間,並顯着減少了TCL存儲陣列和列表的方式所導致的內存消耗。
這樣,每次滾動一系列函數時,都會要求提供可見單元格的內容。
即使超過10毫歐,它也對我有用。 行!
缺點:調用R函數返回每個單元格的Tcl變量仍然遠遠不夠有效。 如果您是第一次滾動,則可以觀看正在更新的單元格。 因此,我仍在尋找R和Tcl / Tk之間的批量數據傳輸解決方案。
歡迎任何改善性能的建議!
我實現了一個小型演示(具有1個mio行和21個列,消耗1.2 GB的RAM),並添加了一些按鈕來測試不同的功能(例如緩存)。
注意:啟動時間過長是由創建基礎測試數據引起的,而不是由Tktable引起的!
library(tcltk)
library(data.table)
# Tktable example with -command ("unbound" mode) ---------------------------
# Doc: http://tktable.sourceforge.net/tktable/doc/tkTable.html
NUM.ROWS <- 1E6
NUM.COLS <- 20
# generate a big data.frame - this will take a while but is required for the demo
dt.data <- data.table(ID = 1:NUM.ROWS)
for (i in 1:NUM.COLS) {
dt.data[, (paste("Col",i)) := paste0("R", 1:NUM.ROWS, " C", i)]
}
# Fill one cell with a long text containing special control characters to test the Tktable behaviour
dt.data[3,3 := "This is a long text with backslash \\ and \"quotes\"!"]
tclRequire("Tktable")
t <- tktoplevel()
tkwm.protocol(t, "WM_DELETE_WINDOW", function() tkdestroy(t))
# Function to return the current row and column as "calculated" value (without an underlying data "model")
calculated.data <- function(C) {
# Function arguments for Tcl "substitutions":
# See: http://tktable.sourceforge.net/tktable/doc/tkTable.html
# %c the column of the triggered cell.
# %C A convenience substitution for %r,%c.
# %i 0 for a read (get) and 1 for a write (set). Otherwise it is the current cursor position in the cell.
# %r the row of the triggered cell.
return(tclVar(C)) # this does work!
}
# Function to return the content of a data.table for the current row and colum
data.frame.data <- function(r, c) {
if( r == "0")
return(tclVar(names(dt.data)[as.integer(c)+1])) # First row contains the column names
else
return(tclVar(as.character(dt.data[as.integer(r)+1, as.integer(c)+1, with = FALSE]))) # Other rows are data rows
}
frame <- ttklabelframe(t, text = "Data:")
# Add the table to the window environment to ensure killing it when the window is closed (= no more phantom calls to the data command handler)
# Cache = TRUE: This greatly enhances speed performance when used with -command but uses extra memory.
t$env$table <- tkwidget(frame, "table", rows = NUM.ROWS, cols = NUM.COLS, titlerows = 1, selecttype = "cell", selectmode = "extended", command = calculated.data, cache = TRUE, yscrollcommand = function(...) tkset(scroll.y, ...), xscrollcommand = function(...) tkset(scroll.x, ...))
scroll.x <- ttkscrollbar(frame, orient = "horizontal", command=function(...) tkxview(t$env$table,...)) # command that performs the scrolling
scroll.y <- ttkscrollbar(frame, orient = "vertical", command=function(...) tkyview(t$env$table,...)) # command that performs the scrolling
buttons <- ttkframe(t)
btn.read.only <- ttkbutton(buttons, text = "make read only", command = function() tkconfigure(t$env$table, state = "disabled"))
btn.read.write <- ttkbutton(buttons, text = "make writable", command = function() tkconfigure(t$env$table, state = "normal"))
btn.clear.cache <- ttkbutton(buttons, text = "clear cache", command = function() tcl(t$env$table, "clear", "cache"))
btn.bind.data.frame <- ttkbutton(buttons, text = "Fill cells from R data.table",
command = function() {
tkconfigure(t$env$table, command = data.frame.data, rows = nrow(dt.data), cols = ncol(dt.data), titlerows = 1)
tcl(t$env$table, "clear", "cache")
tkwm.title(t,"Cells are filled from an R data.table")
})
btn.bind.calc.value <- ttkbutton(buttons, text = "Fill cells with calculated values",
command = function() {
tkconfigure(t$env$table, command = calculated.data, rows = 1E5, cols = 40)
tcl(t$env$table, "clear", "cache")
tkwm.title(t,"Cells are calculated values (to test the highest performance possible)")
})
tkgrid(btn.read.only, row = 0, column = 1)
tkgrid(btn.read.write, row = 0, column = 2)
tkgrid(btn.clear.cache, row = 0, column = 3)
tkgrid(btn.bind.data.frame, row = 0, column = 5)
tkgrid(btn.bind.calc.value, row = 0, column = 6)
tkpack(frame, fill = "both", expand = TRUE)
tkpack(scroll.x, fill = "x", expand = FALSE, side = "bottom")
tkpack(scroll.y, fill = "y", expand = FALSE, side = "right")
tkpack(t$env$table, fill = "both", expand = TRUE, side = "left")
tkpack(buttons, side = "bottom")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.