簡體   English   中英

篩選數據框中的每一行並對其進行手動分類

[英]Sift through each row in a dataframe and manually classify it

有人可以推薦一種有效的方法來篩選數據框中的每一行並對其進行手動分類嗎? 例如,我可能想將垃圾郵件與電子郵件,候選廣告,求職者或約會機構的個人資料分開(我知道Tinder可以通過向左或向右滑動來做到這一點)。

我的數據集足夠小,可以手動分類。 我想如果它更大,我可能只想手動分類它的一部分,以便訓練諸如Naive Bayes之類的機器學習算法來為我完成任務。

我將向您展示當前的功能,但這並不是一項特別原始的任務,因此,必須有一種較粗略的方法來完成此任務,而這已經有人想到了! (作為一個新手,R的強大功能給我留下了深刻的印象,但是當清除屏幕或捕獲按鍵之類的小任務變得不平凡時,R也給我留下了深刻的印象)

# Let us suppose I am using this built-in dataset to draw up a
# shortlist of where I might wish to go on holiday
df <- data.frame(state.x77);

# pp - define a task-specific pretty print function
pp <- function(row) {
    print(row); # Example dataset is simple enough to just print the entire row
}

# cls - clear the screen (this hack works on Windows but I've commented it for now)
cls <- function() {
    #system("powershell -ExecutionPolicy Bypass -command (New-Object -ComObject Wscript.Shell).SendKeys([string][char]12)");
}

# It would halve the number of keystrokes needed if I knew a way to read
# a single character
readcharacter <- readline;

sift <- function(df, pp)
{
    classification = rep('', nrow(df));

    for (nRow in 1:nrow(df))
    {
        cls();
        pp(df[nRow,]);
        cat("\nEnter 'a' to discard, 'd' to keep, 'q' to quit\n");

        char <- '';
        while (char != 'a' && char != 'd' && char != 'q') {
            char <- readcharacter();
        }

        if (char == 'q')
            break;

        classification[nRow] = char;
    }

    return(cbind(df,classification=classification));
}

result = sift(df, pp);

cls();
cat("Shortlist:\n");
print(row.names(result[result$classification=='d',]));

那么,StackOverflow社區對使用此Shiny應用程序解決問題的感覺如何? 我不希望看到Shiny在數據分析的早期階段使用-通常只有在我們有一些想要探索或動態呈現的結果時,它才會發揮作用。

學習Shiny既有趣又有用,但是如果可以找到一個不太復雜的答案,我會更喜歡它。

library(shiny);

#
# shortlist - function that allows us to shortlist through the rows in a data frame efficiently
#
shortlist <- function(df, sTitle, sRowName) {

    createUI <- function() {

        listHeading <- list(
                    textOutput(outputId = "Progress"),
                    tags$br(),
                    fluidRow(
                        column(width=1, sRowName),
                        column(width=9, textOutput(outputId = "RowName"))));

        listFields <- lapply(names(df), function(sFieldname) {

            return(fluidRow(
                column(width=1, sFieldname),
                column(width=9, textOutput(outputId = sFieldname))));
        });

        listInputs <- list(
                    tags$br(),
                    tags$table(
                        tags$tr(
                            tags$td(" "),
                            tags$td(actionButton(inputId="Up", label="W", disabled=TRUE, width="100%"))),
                        tags$tr(
                            tags$td(width="100px", actionButton(inputId="Discard", label="Discard, A", width="100%")),
                            tags$td(width="100px", actionButton(inputId="Down", label="S", disabled=TRUE, width="100%")),
                            tags$td(width="100px", actionButton(inputId="Keep", label="Keep, D", width="100%")))),
                        tags$script("

                            // JavaScript implemented keyboard shortcuts, including lots of conditions to
                            // ensure we're finished processing one keystroke before we start the next.

                            var bReady = false;

                            $(document).on('shiny:recalculating', function(event) {
                                bReady = false;
                            });

                            $(document).on('shiny:recalculated', function(event) {
                                setTimeout(function() {bReady = true;}, 500);
                            });

                            $(document).on('keypress', function(event) {

                                if (bReady) {

                                    switch(event.key.toLowerCase()) {
                                    case 'a':
                                        document.getElementById('Discard').click();
                                        bReady = false;
                                        break;
                                    case 'd':
                                        document.getElementById('Keep').click();
                                        bReady = false;
                                        break;
                                    }
                                }
                            });

                            // End of JavaScript

                        "));

        listPanel <- list(
                    title = sTitle,
                    tags$br(),
                    conditionalPanel(
                        condition = paste("input.Keep + input.Discard <", nrow(df)),
                        append(append(listHeading, listFields), listInputs)));

        listShortlist <- list(
                    tags$hr(),
                    tags$h4("Shortlist:"),
                    dataTableOutput(outputId="Shortlist"));

        ui <- do.call(fluidPage, append(listPanel, listShortlist));

        return(ui);
    }

    app <- shinyApp(ui = createUI(), server = function(input, output) {

        classification <- rep('', nrow(df));

        getRow <- reactive({

            return (input$Keep + input$Discard + 1);
        });

        classifyRow <- function(nRow, char) {

            if (nRow <= nrow(df)) {
                classification[nRow] <<- char;
            }

            # In interactive mode, automatically stop the app when we're finished
            if ( interactive() && nRow >= nrow(df) ) {
                stopApp(classification);
            }
        }

        observeEvent(input$Discard, {classifyRow(getRow() - 1, 'a')});
        observeEvent(input$Keep,    {classifyRow(getRow() - 1, 'd')});

        output$Progress = renderText({paste("Showing record", getRow(), "of", nrow(df))});
        output$RowName  = renderText({row.names(df)[getRow()]});

        lapply(names(df), function(sFieldname) {
            output[[sFieldname]] <- renderText({df[getRow(), sFieldname]});
        });

        output$Shortlist <- renderDataTable(options = list(paging = FALSE, searching = FALSE), {

            # Mention the 'keep' input to ensure this code is called when the 'keep' button
            # is pressed.  That way the shortlist gets updated when an item to be added to it.
            dummy <- input$Keep;

            # Construct the shortlist
            shortlist <- data.frame(row.names(df[classification == 'd',]));
            colnames(shortlist) <- sRowName;
            return(shortlist);
        });

    });

    if (interactive()) {
        classification <- runApp(app);

        return(cbind(df, classification = classification));
    } else {
        return(app);
    }
}

#
# And now some example code.
# Shortlist the built in state.x77 data set (let us suppose I am drawing up
# a shortlist of where I might wish to go on holiday)
#

df <- data.frame(state.x77);

result <- shortlist(df = df, "Choose states", "State");

if (interactive()) {
    cat("Shortlist:\n");
    print(row.names(result[result$classification == 'd',]));
} else {
    return (result);
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM