繁体   English   中英

从T-SQL数据表中获取单词的最快方法是什么?

[英]What is a fastest way to get words from T-SQL datatable?

我有一个SQL Server 2008 R2 dbo.Forum_Postsdbo.Forum_Posts ,其dbo.Forum_Posts Subject (nvarchar(255))Body (nvarchar(max))

我想获得的所有单词与列长度> = 3 SubjectBody并将其插入到数据表dbo.Search_Word (column Word, nvarchar(100))和DataTable dbo.SearchItem (column Title (nvarchar(200))

我还希望得到新生成的SearchWordsID (primary key, autoincrement, int)dbo.Search_WordSearchItemID (primary key, autoincrement,int)dbo.SearchItem ,并将其插入到数据表dbo.SearchItemWord (columns SearchWordsID (foreign key,int, not null)SearchItemID (foreign key,int,not null)

在T-SQL中最快的方法是什么? 还是我必须使用C#? 预先感谢您的任何帮助。

根据要求,这将保留ID。 因此,您将获得一个按ID分配的DISTINCT工作清单。

与第一个答案略有不同,但可通过外部申请轻松实现

**

您必须编辑初始查询。从[YourTable]中选择KeyID = [YourKeyID],Words = [YourField1] +''+ [YourField2]

**

Declare @String    varchar(max) = ''
Declare @Delimeter varchar(25)  = ' '

-- Generate and Strip special characters
Declare @StripChar table (Chr varchar(10));Insert Into @StripChar values ('.'),(','),('/'),('('),(')'),(':')  -- Add/Remove as needed

-- Generate Base Data and Expand via Outer Apply
Declare @XML xml
Set @XML = (
            Select A.KeyID
                  ,B.Word
             From ( Select KeyID=[YourKeyID],Words=[YourField1]+' '+[YourField2] from [YourTable]) A
             Outer Apply (
                          Select Word=split.a.value('.', 'varchar(150)') 
                           From  (Select Cast ('<x>' + Replace(A.Words, @Delimeter, '</x><x>')+ '</x>' AS XML) AS Data) AS A 
                           Cross Apply data.nodes ('/x') AS Split(a)
             ) B
 For XML RAW)

-- Convert XML to varchar(max) for Global Search & Replace (could be promoted to Outer Appy)
Select @String = Replace(Replace(cast(@XML as varchar(max)),Chr,' '),'  ',' ') From @StripChar
Select @XML    = cast(@String as XML)

Select Distinct
       KeyID = t.col.value('@KeyID', 'int')
      ,Word  = t.col.value('@Word', 'varchar(150)')
 From  @XML.nodes('/row') AS t (col)
 Where Len(t.col.value('@Word', 'varchar(150)'))>3
 Order By 1

退货

KetID   Word
0       UNDEF
0       Undefined
1       HIER
1       System
2       Control
2       UNDEF
3       JOBCONTROL
3       Market
3       Performance
...
87      Analyitics
87      Market
87      UNDEF
88      Branches
88      FDIC
88      UNDEF
...

您将需要T-SQL插入表中。 您面临的最大挑战是将帖子拆分成文字。

我的建议是将帖子阅读为C#,将每个帖子拆分为单词(您可以使用Split方法拆分空格或标点符号),过滤单词集合,然后从C#执行插入。

如果使用Entity Framework或类似的ORM,则可以避免直接使用T-SQL。

除非您真的想要一个完全的SQL解决方案并且愿意花时间完善它,否则不要尝试使用T-SQL将您的帖子分成单词。 而且,是的,它会很慢:T-SQL在字符串操作上并不快。

您还可以研究全文索引,我相信它可以支持搜索关键字。

也许这会有所帮助

Declare @String varchar(max) = ''
Declare @Delimeter varchar(25)  = ' '

Select @String = @String + ' '+Words
  From (
         Select Words=[YourField1]+' '+[YourField2] from [YourTable]
       ) A

-- Generate and Strip special characters
Declare @StripChar table (Chr varchar(10));Insert Into @StripChar values ('.'),(','),('/'),('('),(')'),(':')  -- Add/Remove as needed
Select @String = Replace(Replace(@String,Chr,' '),'  ',' ') From @StripChar

-- Convert String into XML and Split Delimited String
Declare @Table Table (RowNr int Identity(1,1), String varchar(100))
Declare @XML xml = Cast('<x>' + Replace(@String,@Delimeter,'</x><x>')+'</x>' as XML)
Insert Into @Table Select String.value('.', 'varchar(max)') From @XML.nodes('x') as T(String)

-- Generate Final Resuls
Select Distinct String
 From  @Table
 Where Len(String)>3
 Order By 1

退货(样品)

    String
    ------------------
    Access
    Active
    Adminstrators
    Alternate
    Analyitics
    Applications
    Branches
    Cappelletti
    City
    Class
    Code
    Comments
    Contact
    Control
    Daily
    Data
    Date
    Definition
    Deleted
    Down
    Email
    FDIC
    Variables
    Weekly

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM