如何使用SQL Server从字符串列获取特定单词

Question

I have a table. 我有桌子

create table tblProduct
(
    ProductID int primary key identity(1000,1),
    ProductName varchar(100),
    ProductDescription nvarchar(max)
)

In this table, there are 1000 records like this... 在此表中，有1000条这样的记录...

ProductID=1001  
ProductName='Apple i6'  
ProductDescription='Lorem Ipsum is simply dummy text of the printing and typesetting industry. **Product of USA** Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.'

ProductID=1002  
ProductName='Micromax Canvas'  
ProductDescription='Scrambled it to make a type specimen bookLorem Ipsum is simply dummy text of the printing and typesetting industry. **Product of INDIA** Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.'

ProductID=1003  
ProductName='Oppo Z3'  
ProductDescription='Lorem Ipsum is simply dummy text of the printing and typesetting industry. **Product of INDIA** Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.'

and so on.... 等等....

Now I want to find only country name (no duplicate and no other word) from the productDescription column of the tblProduct ... 现在，我只想从tblProduct的productDescription列中找到国家名称（没有重复且没有其他词）...

Output must be something like this: 输出必须是这样的：

Country Name - Total(Group By)  
USA - 1  
INDIA - 2

Note: "product of XXX*" will be available almost all rows of productdescription column. 注意：“ XXX *产品”几乎在productdescription列的所有行中都可用。

*xxx is the country name. * xxx是国家名。

Answer 1

You can use string manipulation functions LEFT/RIGHT/PATINDEX : 您可以使用字符串操作函数LEFT/RIGHT/PATINDEX ：

WITH cte AS
(
  SELECT 
   r = RIGHT(ProductDescription, LEN(ProductDescription) -  
                               PATINDEX('%Product of%' ,ProductDescription) - 10)
  FROM #tblProduct
  WHERE PATINDEX('%Product of%' ,ProductDescription) > 0
)
SELECT country = LEFT(r, CHARINDEX(' ', r)-1), COUNT(*) AS Total
FROM cte
GROUP BY LEFT(r, CHARINDEX(' ', r)-1);

LiveDemo

But you have to think about corner cases: 但是您必须考虑一些极端情况：

country name contains multiple words like Russian Federation 国家名称包含多个词，例如Russian Federation
what with multiple names ( US/USA/United States of America... ) you will get multiple groups, you need data cleansing 什么有多个名称（ US/USA/United States of America... ），您将获得多个组，需要数据清理

Note: "product of XXX*" will be available almost all rows of productdescription column. 注意：“ XXX *产品”几乎在productdescription列的所有行中都可用。

If you know all countries in advance it will be much easier. 如果您提前知道所有国家/地区，它将更加容易。 Just create table countries: 只需创建表格国家/地区：

name      master 
'U.S.'    'USA'
'USA'     'USA'
'India'   'India'

SELECT c.master, COUNT(*) AS total
FROM #tblProduct p
JOIN countries c
  ON p.Description LIKE '%Product of ' + c.name + '%'
GROUP BY c.master;

EDIT: 编辑：

WITH cte AS
(
SELECT r = RIGHT(ProductDescription, LEN(ProductDescription) -  PATINDEX('%Product of <strong>%' ,ProductDescription) - 18)
FROM #tblProduct
WHERE PATINDEX('%Product of <strong>%' ,ProductDescription) > 0
)
SELECT country = LEFT(r, CHARINDEX('<', r)-1), COUNT(*) AS Total
FROM cte
GROUP BY r,LEFT(r, CHARINDEX('<', r)-1);

LiveDemo2

如何使用SQL Server从字符串列获取特定单词

问题描述

1 个解决方案

解决方案1
2 2015-11-13 21:33:45

如何使用SQL Server从字符串列获取特定单词

问题描述

1 个解决方案

解决方案1 2 2015-11-13 21:33:45

解决方案1
2 2015-11-13 21:33:45