繁体   English   中英

在VB.NET中计算单词频率的最佳方法是什么?

[英]What is the best way to calculate word frequency in VB.NET?

关于如何在C#中计算字频率有一些很好的例子,但它们都不是全面的,我真的需要在VB.NET中使用一个。

我目前的方法仅限于每个频率计数一个字。 改变这个的最佳方法是什么,这样我才能获得完全准确的词频列表?

wordFreq = New Hashtable()

Dim words As String() = Regex.Split(inputText, "(\W)")
    For i As Integer = 0 To words.Length - 1
        If words(i) <> "" Then
            Dim realWord As Boolean = True
            For j As Integer = 0 To words(i).Length - 1
                If Char.IsLetter(words(i).Chars(j)) = False Then
                    realWord = False
                End If
            Next j

            If realWord = True Then
                If wordFreq.Contains(words(i).ToLower()) Then
                    wordFreq(words(i).ToLower()) += 1
                Else
                    wordFreq.Add(words(i).ToLower, 1)
                End If
            End If
        End If
    Next

Me.wordCount = New SortedList

For Each de As DictionaryEntry In wordFreq
        If wordCount.ContainsKey(de.Value) = False Then
            wordCount.Add(de.Value, de.Key)
        End If
Next

我更喜欢一个实际的代码片段,但通用的'哦是的...使用它并运行'也可以。

这可能是你正在寻找的:

    Dim Words = "Hello World ))))) This is a test Hello World"
    Dim CountTheWords = From str In Words.Split(" ") _
                        Where Char.IsLetter(str) _
                        Group By str Into Count()

我刚刚测试过它确实有效

编辑! 我添加了代码以确保它只计算字母而不是符号。

仅供参考:我发现了一篇关于如何使用LINQ和目标2.0的文章,它有点脏,但它可能会帮助某人http://weblogs.asp.net/fmarguerie/archive/2007/09/05/linq-support-on -net-2-0.aspx

Public Class CountWords

    Public Function WordCount(ByVal str As String) As Dictionary(Of String, Integer)
        Dim ret As Dictionary(Of String, Integer) = New Dictionary(Of String, Integer)

        Dim word As String = ""
        Dim add As Boolean = True
        Dim ch As Char

        str = str.ToLower
        For index As Integer = 1 To str.Length - 1 Step index + 1
            ch = str(index)
            If Char.IsLetter(ch) Then
                add = True
                word += ch
            ElseIf add And word.Length Then
                If Not ret.ContainsKey(word) Then
                    ret(word) = 1
                Else
                    ret(word) += 1
                End If
                word = ""
            End If
        Next

        Return ret
    End Function

End Class

然后,对于快速演示应用程序,创建一个winforms应用程序,其中包含一个名为InputBox的多行文本框,一个名为OutputList的列表视图和一个名为CountBtn的按钮。 在列表视图中创建两列 - “Word”和“Freq”。 选择“详细信息”列表类型。 为CountBtn添加事件处理程序。 然后使用此代码:

Imports System.Windows.Forms.ListViewItem

Public Class MainForm

    Private WordCounts As CountWords = New CountWords

    Private Sub CountBtn_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles CountBtn.Click
        OutputList.Items.Clear()
        Dim ret As Dictionary(Of String, Integer) = Me.WordCounts.WordCount(InputBox.Text)
        For Each item As String In ret.Keys
            Dim litem As ListViewItem = New ListViewItem
            litem.Text = item
            Dim csitem As ListViewSubItem = New ListViewSubItem(litem, ret.Item(item).ToString())

            litem.SubItems.Add(csitem)
            OutputList.Items.Add(litem)

            Word.Width = -1
            Freq.Width = -1
        Next
    End Sub
End Class

你做了一个非常可怕的事情让我在VB中写这个,我永远不会原谅你。

:p

祝好运!

编辑

修复了空白字符串错误和案例错误

这可能会有所帮助:

用于自然语言处理的词频算法

非常接近,但\\ w +是一个很好的正则表达式匹配(仅匹配单词字符)。

Public Function CountWords(ByVal inputText as String) As Dictionary(Of String, Integer)
    Dim frequency As New Dictionary(Of String, Integer)

    For Each wordMatch as Match in Regex.Match(inputText, "\w+")
        If frequency.ContainsKey(wordMatch.Value.ToLower()) Then
            frequency(wordMatch.Value.ToLower()) += 1
        Else
            frequency.Add(wordMatch.Value.ToLower(), 1)
        End If
    Next
    Return frequency
End Function

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM