如何為 r 文本分析創建定制的貿易/法律詞典

Question

我計划在 R 中進行文本分析，就像使用自己的自定義字典按照“貿易”與“法律”邏輯進行情感分析一樣。

我在 excel 文件中有字典所需的所有單詞。 看起來像這樣：

> %  1 Trade 2 Law % business   1 exchange  1 industry  1 rule  2
> settlement    2 umpire    2 court 2 tribunal  2 lawsuit   2 bench 2
> courthouse    2 courtroom 2

為了將其轉換為適合 R 的格式並將其應用於我的文本語料庫，我必須采取哪些步驟？

謝謝您的幫助！

Answer 1

創建一個包含 2 列的 data.frame 並將其存儲在某處，作為 rds、數據庫 object 或 excel。 因此，您可以在每次需要時加載它。

在 data.frame 中獲得數據后，您可以使用 joins /dictionaries 將其與文本語料庫中的單詞匹配。 在評分 data.frame 中，我使用 1 和 2 來表示扇區，但您也可以使用單詞。

請參閱使用 tidytext 的示例，但請閱讀情緒分析並使用您需要的任何 package。

library(tidytext)
library(dplyr)
text_df <- data.frame(id = 1:2,
                      text = c("The business is in the mining industry and has a settlement.",
                               "The court ordered the business owner to settle the lawsuit."))

text_df %>% 
  unnest_tokens(word, text) %>% 
  inner_join(my_scoring_df)

Joining, by = "word"
  id       word sector
1  1   business      1
2  1   industry      1
3  1 settlement      2
4  2      court      2
5  2   business      1
6  2    lawsuit      2

數據：

my_scoring_df <- structure(list(word = c("business", "exchange", "industry", "rule", 
"settlement", "umpire", "court", "tribunal", "lawsuit", "bench", 
"courthouse", "courtroom"), sector = c(1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, 
-12L))

如何為 r 文本分析創建定制的貿易/法律詞典

問題描述

1 個解決方案

解決方案1
1 2020-06-10 14:56:12

如何為 r 文本分析創建定制的貿易/法律詞典

問題描述

1 個解決方案

解決方案1 1 2020-06-10 14:56:12

解決方案1
1 2020-06-10 14:56:12