简体   繁体   English

计算包含大量文本的Excel列中最常用的单词?

[英]Count the most frequently used words in an Excel column containing A LOT of text?

I have a large spreadsheet and I'd like to perform a word count on a specific column to figure out the most frequently used words. 我有一个大型电子表格,我想对特定列执行单词计数,以找出最常用的单词。 This column contains a very large amount of data and text. 此列包含大量数据和文本。

For example, "Employee was climbing a ladder to retrieve merchandise off the top shelf. The ladder began to sway and the employee lost his balance and fell. Injury to the right leg". 例如,“员工爬梯子从顶层货架上取回商品。梯子开始摇摆,员工失去平衡而跌倒。右腿受伤”。 There are about 1000 different records like this. 这样有大约1000种不同的记录。 I was hoping use a pivot table to figure out what the most frequently used words are throughout all the cells in this column. 我希望使用数据透视表来确定本专栏中所有单元格中最常用的单词。

I'm not sure how to do this. 我不知道该怎么做。 Can anyone assist in how to do this? 任何人都可以协助如何做到这一点?

Currently using the following code: 目前使用以下代码:

Option Explicit

Sub MakeWordList()
    Dim InputSheet As Worksheet
    Dim WordListSheet As Worksheet
    Dim PuncChars As Variant, x As Variant
    Dim i As Long, r As Long
    Dim txt As String
    Dim wordCnt As Long
    Dim AllWords As Range
    Dim PC As PivotCache
    Dim PT As PivotTable

    Application.ScreenUpdating = False
    Set InputSheet = ActiveSheet
    Set WordListSheet = Worksheets.Add(after:=Worksheets(Sheets.Count))
    WordListSheet.Range("A1") = "All Words"
    WordListSheet.Range("A1").Font.Bold = True
    InputSheet.Activate
    wordCnt = 2
    PuncChars = Array(".", ",", ";", ":", "'", "!", "#", _
        "$", "%", "&", "(", ")", " - ", "_", "--", "+", _
        "=", "~", "/", "\", "{", "}", "[", "]", """", "?", "*")
    r = 1

'   Loop until blank cell is encountered
    Do While Cells(r, 1) <> ""
'       covert to UPPERCASE
        txt = UCase(Cells(r, 1))
'       Remove punctuation
        For i = 0 To UBound(PuncChars)
            txt = Replace(txt, PuncChars(i), "")
        Next i
'       Remove excess spaces
        txt = WorksheetFunction.Trim(txt)
'       Extract the words
        x = Split(txt)
        For i = 0 To UBound(x)
            WordListSheet.Cells(wordCnt, 1) = x(i)
            wordCnt = wordCnt + 1
        Next i
    r = r + 1
    Loop

'   Create pivot table
    WordListSheet.Activate
    Set AllWords = Range("A1").CurrentRegion
    Set PC = ActiveWorkbook.PivotCaches.Add _
        (SourceType:=xlDatabase, _
        SourceData:=AllWords)
    Set PT = PC.CreatePivotTable _
        (TableDestination:=Range("C1"), _
        TableName:="PivotTable1")
    With PT
        .AddDataField .PivotFields("All Words")
        .PivotFields("All Words").Orientation = xlRowField
    End With
End Sub

Here's a quick and dirty macro (I'm feeling extra helpful today). 这是一个快速而肮脏的宏(我今天感觉更有帮助)。 Put this in your workbook module. 把它放在你的工作簿模块中。 Note: I'm assuming the sheet you will have active is the one with all the text in column A. 注意:我假设您将激活的工作表是包含A列中所有文本的工作表。

Sub Test()
Dim lastRow&, i&, tempLastRow&
Dim rawWS As Worksheet, tempWS As Worksheet

Set rawWS = ActiveSheet
Set tempWS = Sheets.Add
tempWS.Name = "Temp"
rawWS.Activate

'tempWS.Columns(1).Value = rawWS.Columns(1).Value
tempLastRow = 1

With rawWS
    .Columns("A:A").TextToColumns Destination:=Range("A1"), DataType:=xlDelimited, _
                                  TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=True, Tab:=True, _
                                  Semicolon:=False, Comma:=False, Space:=True, Other:=False, TrailingMinusNumbers:=True

    lastRow = .Cells(.Rows.Count, 1).End(xlUp).Row

    For i = lastRow To 1 Step -1
        .Rows(i).EntireRow.Copy
        tempWS.Range("A" & tempLastRow).PasteSpecial Paste:=xlPasteAll, Transpose:=True
        ' tempWS.Range ("A" & tempLastRow)
        tempLastRow = tempWS.Cells(tempWS.Rows.Count, 1).End(xlUp).Row + 1
    Next i
    Application.CutCopyMode = False
End With

With tempWS
    ' Now, let's get unique words and run a count
    .Range("A:A").Copy .Range("C:C")
    .Range("C:C").RemoveDuplicates Columns:=1, Header:=xlNo
    tempLastRow = .Cells(.Rows.Count, 3).End(xlUp).Row

    .Range(.Cells(1, 4), .Cells(tempLastRow, 4)).FormulaR1C1 = "=COUNTIF(C[-3],RC[-1])"
    .Sort.SortFields.Clear
    .Sort.SortFields.Add Key:=Range("D1:D1048576") _
                              , SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
    With .Sort
        .SetRange Range("C1:D1048576")
        .Header = xlGuess
        .MatchCase = False
        .Orientation = xlTopToBottom
        .SortMethod = xlPinYin
        .Apply
    End With

End With

End Sub

Basically, it creates a new sheet, counts all the individual words, and puts the words (and count) in a column, sorted by most used. 基本上,它创建一个新工作表,计算所有单个单词,并将单词(和计数)放在一列中,按大多数使用排序。 You can tweak as needed. 您可以根据需要进行调整。

Note: I made this before you added your code. 注意:我在添加代码之前就这样做了。 It doesn't create a pivot table, but from what I understand you need, a Pivot Table would be overkill if you just need the most used words. 它不会创建一个数据透视表,但根据我的理解,如果你只需要最常用的单词,那么数据透视表会有点过分。 But, let me know if you need any edits or changes! 但是,如果您需要任何编辑或更改,请与我们联系!

Here's a routine to display each word and the number of times it appears (using Split and Collection s) 这是显示每个单词及其出现次数的例程(使用SplitCollection s)

Usage: CountTheWordsInRange Range("A1:A4") 用法: CountTheWordsInRange Range("A1:A4")

Sub CountTheWordsInRange(RangeToCheck As Range)

Dim wordList As New Collection
Dim keyList As New Collection
Dim c
For Each c In RangeToCheck
    Dim words As Variant
    words = Split(c, " ") 'Pick a delimiter
    For Each w In words
        Dim temp
        temp = -1
        On Error Resume Next
        temp = wordList(w)
        On Error GoTo 0
        If temp = -1 Then
            wordList.Add 1, Key:=w
            keyList.Add w, Key:=w
        Else
            wordList.Remove (w)
            keyList.Remove (w)
            wordList.Add temp + 1, w
            keyList.Add w, Key:=w
        End If
    Next w
Next c
'Here we can display the word counts
'KeyList is a collection that contains each word
'WordList is a collection that contains each amount
Dim x
For x = 1 To wordList.Count
    With Sheets("Sheet1")
        .Cells(x, "E").Value = keyList(x)  'Display Word in column "E"
        .Cells(x, "F").Value = wordList(x) 'Display Count in column "F"
    End With
Next x

End Sub

Results: 结果:

结果

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM