如何使用不同的分隔符將 a.txt 文件讀入 R 並在線運行？

Question

我有一個以下格式的 large.txt 文件，顯示大量用戶的日期、用戶和產品評論；

    YYYY:MM:D1 @Username1: this is a product review
    YYYY:MM:D1 @Username2: this is also a product review
    YYYY:MM:D1 @Username3: this is also a product review that
    runs to the next line
    YYYY:MM:D1 @Username4: this here is also a product review

我想將其提取到具有 3 列的 dataframe 中，如下所示：

    date/time      username      comment
    yyyy/mm/dd     @Username1    this is a product review   
    yyyy/mm/dd     @Username2    this is also a product review   
    yyyy/mm/dd     @Username3    this is also a product review contained in the same row
    yyyy/mm/dd     @Username4    this here is also a product review

使用標准 R 基本命令

    read.table("filename.txt", fill=TRUE)

給我一個 dataframe 將產品評論中的每個單詞視為不同的列。 它還將評論變成足夠長的“連續行”進入新行，即

    V1          V2          V3          V4          V5          
    yy/mm/dd    Username1   this        is          a 
    product     review 
    ...

任何幫助表示贊賞！

Answer 1

您可以通過幾種不同的方式解決此問題。 一種方法是將數據導入單個列，然后使用tidyr::separate或data.table::strsplit 在適當的位置拆分列。 這是tidyr的示例：

# Use a separator symbol that is unlikely to appear in the file,
# to read the data into a single column:
data <- read.table("filename.txt", sep = "^")

# First split the column at the @-sign, and then at the ": "-part:
library(tidyr)
data %>% separate(V1,
                into = c("Date", "User"),
                sep = " @") %>%
    separate(User,
        into = c("User", "Review"),
        sep = ": ") -> data

# If you want to add back the @-sign to the usernames:
data$User <- paste("@", data$User, sep = "")

如何使用不同的分隔符將 a.txt 文件讀入 R 並在線運行？

問題描述

1 個解決方案

解決方案1
0 已采納 2021-03-03 08:05:00

如何使用不同的分隔符將 a.txt 文件讀入 R 並在線運行？

問題描述

1 個解決方案

解決方案1 0 已采納 2021-03-03 08:05:00

解決方案1
0 已采納 2021-03-03 08:05:00