简体   繁体   English

提取仅存在于某些行中且位于 2 个不同分隔符之间的字符串

[英]Extracting string that only exists in certain rows and is between 2 different delimiters

I have the following table and I am trying to extract a string that only exists in certain columns and the string is between two different delimiters (, :).我有下表,我正在尝试提取仅存在于某些列中的字符串,并且该字符串位于两个不同的分隔符 (, :) 之间。

df: df:

col1 col1 col2列2
Patient 001 data retrieved: 9089800, John,Doe检索到的患者 001 数据:9089800,John,Doe CA加州
Hospital stay住院 AZ阿兹
Patient 002 data retrieved: 9123010, Steve,Doe检索到的患者 002 数据:9123010,Steve,Doe NY纽约
Patient 003 data retrieved: 9034291, Alex,Doe检索到的患者 003 数据:9034291,Alex,Doe MI心率
Patient 004 information not found未找到患者 004 信息 VT VT

df_final df_final

col1 col1 col2列2 result结果
Patient 001 data retrieved: 9089800, John,Doe检索到的患者 001 数据:9089800,John,Doe CA加州 9089800 9089800
Hospital stay住院 AZ阿兹
Patient 002 data retrieved: 9123010, Steve,Doe检索到的患者 002 数据:9123010,Steve,Doe NY纽约 9123010 9123010
Patient 003 data retrieved: 9034291, Alex,Doe检索到的患者 003 数据:9034291,Alex,Doe MI心率 9034291 9034291
Patient 004 information not found未找到患者 004 信息 VT VT

I understand that the way the data is currently is not efficient but this is the dataset/task I have been given.我知道数据目前的方式效率不高,但这是我得到的数据集/任务。 Is there anyway to work around this?有什么办法可以解决这个问题吗?

his is what I have so far but it just retrieves the entire string for all rows.这是我目前所拥有的,但它只是检索所有行的整个字符串。 Not sure what I am doing wrong.不确定我做错了什么。

SELECT TOP 100 *, 
SUBSTRING(col1,CHARINDEX('data retrieved:',col1)+1,
        (((LEN(col1))-CHARINDEX(',', REVERSE(col1)))-CHARINDEX('data retrieved:',col1))) AS Result 
FROM df

A little bit more bullet proof:多一点防弹:

trim(case when charindex(':', col1) <> 0 then
      case when charindex(',', col1, charindex(':', col1)+1) <> 0 then
          substring(col1, charindex(':', col1)+1, 
              charindex(',', col1, charindex(':', col1)+1) -
              charindex(':', col1) - 1
          ) 
      end
  end)

Tired of extracting strings... left, right, patindex, charindex, ...厌倦了提取字符串......左,右,patindex,charindex,......

Here is an option that uses a helper function which accepts two unlike delimeters.这是一个使用助手 function 的选项,它接受两个不同的定界符。 In this case a : and ,在这种情况下:,

Example例子

Declare @YourTable Table ([col1] varchar(50),[col2] varchar(50))  Insert Into @YourTable Values 
 ('Patient 001 data retrieved: 9089800, John,Doe','CA')
,('Hospital stay','AZ')
,('Patient 002 data retrieved: 9123010, Steve,Doe','NY')
,('Patient 003 data retrieved: 9034291, Alex,Doe','MI')
,('Patient 004 information not found','VT')
 
Select A.* 
      ,Result = B.RetVal
 From  @YourTable A
 Outer Apply [dbo].[tvf-Str-Extract-JSON](Col1,':',',') B

Results结果

在此处输入图像描述

The Function if Interested Function 有意者

CREATE FUNCTION [dbo].[tvf-Str-Extract-JSON] (@String nvarchar(max),@Delim1 nvarchar(100),@Delim2 nvarchar(100))
Returns Table 
As
Return (  

    Select RetSeq = row_number() over (order by RetSeq)
          ,RetVal = left(RetVal,charindex(@Delim2,RetVal)-1)
    From  (
            Select RetSeq = [Key]+1
                  ,RetVal = trim(Value)
             From  OpenJSON( N'["'+replace(string_escape(@String,'json'),@Delim1,'","')+N'"]' )

          ) C1
    Where charindex(@Delim2,RetVal)>1

)

If you want to try without the TVF https://dbfiddle.uk/Aw9qByxC如果你想在没有 TVF 的情况下尝试https://dbfiddle.uk/Aw9qByxC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM