每次用户在网站上搜索文本时,搜索文本都会记录到search_table中。 子搜索也被记录。 它们记录有星号。

目的是找到用户搜索的最完整的搜索文本。

理想的方法是:

        Group the ids = 1,4,6 and obtain id=6
        Group the ids = 2,5,7 and obtain id = 7
        Group the ids = 3 and obtain id = 3
        Group the ids 8, 9 and obtain id = 9

SEARCH_TABLE

            id user   search_text
            --------------------
            1  user1  data manag*
            2  user1  confer*
            3  user1  incomplete sear*
            4  user1  data managem*
            5  user1  conference c*
            6  user1  data management
            7  user1  conference call
            8 user1  status in*
            9 user1  status information

输出应为

        user  search_text
        ---------------------
        user1 data management
        user1 conference call
        user1 incomplete sear*
        user1 status information

你能帮忙吗?

===============>>#1 票数:0

像下面这样的事情应该做的工作:

SELECT * FROM 
    SEARCH_TABLE st 
    WHERE 
    NOT EXISTS (

    SELECT 1 FROM 
        SEARCH_TABLE st2 
        -- remove asterkis and ad %
        WHERE  st2.search_Text LIKE replace(st.search_text,'*','')||'%'
    )

这将过滤掉属于其他搜索的所有搜索。

===============>>#2 票数:0

这可能不是最优雅的方法,但是可以尝试一下:

   alter table your_table
   add group_id int 

   select [user], left(search_text, 5) as Group_Text, IDENTITY(int, 1,1) as Group_ID
   into #group_id_table
   from your_table
   group by [user], left(search_text, 5)
   order by [user], left(search_text, 5)

   update a
   set a.group_id = b.group_id
   from your_table as a
   join #group_id_table as b
   on left(search_text, 5) = group_text

   select [user], max(search_text), group_id
   from your_table
   group by [user], group_id
   order by [user], group_id

当我运行它时,这达到了预期的结果,但是当然,因为您是根据用户指定的字符串长度来设置group_id的,所以那里可能存在问题。 希望这对您有所帮助。

===============>>#3 票数:0

试一下。 我分离出完成的文本(及其较短的部分),然后为每条记录找到最长的部分。 在Oracle中进行了测试,因为我现在无法访问PostgreSQL,但是我没有使用任何奇特的东西,因此它应该可以工作。

with 
  --Contains all completed searches
  COMPLETE   as (select * from SEARCH_TABLE where SEARCH_TEXT not like '%*'),
  --Contains all searches that are incomplete and dont have a completed match
  INCOMPLETE as (
    select S.* 
    from SEARCH_TABLE S 
    left join COMPLETE C 
      on  S.USR = C.USR
      and C.SEARCH_TEXT like replace(S.SEARCH_TEXT, '*', '%')
    where C.ID is null
  ),
  --chains all incompleted with any matching pattern shorter than it.
  CHAINED_INC as (
    select LONGER.USR, LONGER.ID, LONGER.SEARCH_TEXT, SHORTER.SEARCH_TEXT SEARCH_TEXT_SHORT
    from INCOMPLETE LONGER 
    join INCOMPLETE SHORTER
      on  LONGER.SEARCH_TEXT like replace(SHORTER.SEARCH_TEXT, '*', '%')
      and LONGER.ID <> SHORTER.ID
  )
--if a text is not the shorter text for a different record, that means it's the longest text for that pattern.
select distinct T1.USR, T1.SEARCH_TEXT  
from CHAINED_INC T1 
left join CHAINED_INC T2
  on  T1.USR = T2.USR
  and T1.SEARCH_TEXT = T2.SEARCH_TEXT_SHORT
where T2.SEARCH_TEXT_SHORT is null
--finally, union back to the completed texts.
union all
select USR, SEARCH_TEXT from COMPLETE
;

编辑:从选择中删除ID

  ask by lambda9 translate from so

未解决问题?本站智能推荐: