简体   繁体   English

通过识别句点,空格,然后是下一句的大写字母来获取 MySQL 字符串中的第一句

[英]Get first sentence in MySQL string by identifying the period, space, then capital letter of the next sentence

String1 = " Widgets Inc. is the largest widgets producer in the world. It's much bigger than McWidgets Inc." String1 = " Widgets Inc. 是世界上最大的小部件生产商。它比 McWidgets Inc. 大得多。

String2 = " Fidgets Inc is the second largest fidgets producer. It's just behind McFidgets Inc. The CEO of this company loves synergy." String2 = " Fidgets Inc 是第二大指尖生产商。仅次于 McFidgets Inc。这家公司的 CEO 喜欢协同作用。

String3 = " Glorious Gagets Co. is considered blah blah jdfglmdslgmldfg. " String3 = " Glorious Gagets Co. 被认为是 blah blah jdfglmdslgmldfg。 "

For all of the above scenarios, I would like to reliably select the first sentence only .对于上述所有场景,我只想可靠地选择第一句话 I would use:我会用:

[EDIT]: note that there are no real patterns in the sentences. [编辑]:请注意,句子中没有真正的模式。

SUBSTRING_INDEX(string, '. ', 1)

However this would cause issues with the first and third string, as they sometimes have a '.'但是,这会导致第一个和第三个字符串出现问题,因为它们有时带有 '.'。 after the name, and sometimes not.在名字之后,有时不是。

My thought was to use something like SUBSTRING_INDEX(string, '. [AZ]', 1), and essentially tell it to look for the first '.'我的想法是使用类似 SUBSTRING_INDEX(string, '. [AZ]', 1) 的东西,并基本上告诉它寻找第一个 '.' which is followed by a space and then any capital letter (ie start of the next sentence), but my SQL-fu is not strong enough yet to figure out how to do that.后面跟着一个空格,然后是任何大写字母(即下一个句子的开头),但我的 SQL-fu 还不够强大,无法弄清楚如何做到这一点。

Any help would be appreciated!任何帮助,将不胜感激!

When you have a fixed pattern, you can use LOCATE to find the index and then use SUBSTRING to remove it.当您有固定模式时,您可以使用 LOCATE 查找索引,然后使用 SUBSTRING 将其删除。 For the startung point you need regular explression, if you don't want to use functions or stored procedures, which you also need for more complex patterns对于起点,您需要定期表达,如果您不想使用函数或存储过程,您也需要更复杂的模式

CREATE TABLE table1 (tex varchar(200))
 INSERT INTO table1 VALUES ("Widgets Inc. is the largest widgets producer in the world. It's much bigger than McWidgets Inc.") ,("Fidgets Inc is the largest fidgets producer in the world. It's much bigger than McFidgets Inc.")
 SELECT SUBSTRING(tex,REGEXP_INSTR(tex, '[AZ]'),LOCATE('producer in the world.',tex)+ 21) FROM table1
\n| | SUBSTRING(tex,REGEXP_INSTR(tex, '[AZ]'),LOCATE('producer in the world.',tex)+ 21) | SUBSTRING(tex,REGEXP_INSTR(tex, '[AZ]'),LOCATE('世界生产商。',tex)+ 21) |\n| | :--------------------------------------------------------------------------------- | :------------------------------------------------- -------------------------------- |\n| | Widgets Inc. is the largest widgets producer in the world. Widgets Inc. 是世界上最大的小部件生产商。 | |\n| | Fidgets Inc is the largest fidgets producer in the world. Fidgets Inc 是世界上最大的指尖制造商。 | |\n

db<>fiddle here db<> 在这里摆弄

K looks like I have a work-around in the absence of actually identifying sentences in the requested manner, ie by somehow including a capital letter check in the substring parameter.在没有以请求的方式实际识别句子的情况下,K 看起来我有一个解决方法,即以某种方式在子字符串参数中包含大写字母检查。

Found a list of abbreviations which would contain a period (ie Co., Inc., Ltd., etc...) and hardcoded to replace them without the period - Co, Ltd, Inc etc... then did the substring as normal.找到一个包含句点的缩写列表(即 Co., Inc., Ltd. 等...)并硬编码以在没有句点的情况下替换它们 - Co, Ltd, Inc 等...然后像往常一样处理子字符串. Not ideal but it works.不理想,但它有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM