[英]How to extract text around a word in excel or python?
I have a thousands lines text that goes like: 我有几千行文字,如:
ksjd 234first special 34-37xy kjsbn
sde 89second special 22-23xh ewio
647red special 55fg dsk
uuire another special 98
another special 107r
green special 55-59 ewk
blue special 31-39jkl
I need to extract a word before "special" and number (or number range) from the right. 我需要在“特殊”之前提取一个单词,从右边提取数字(或数字范围)。 In other words, I want: 换句话说,我想:
converted into a table: 转换成表格:
A fast way to do this is to use regular expressions: 一种快速的方法是使用正则表达式:
In [1]: import re
In [2]: text = '''234first special 34-37xy
...: 89second special 22-23xh
...: 647red special 55fg
...: another special 98
...: another special 107r
...: green special 55-59
...: blue special 31-39jkl'''
In [3]: [re.findall('\d*\s*(\S+)\s+(special)\s+(\d+(?:-\d+)?)', line)[0] for line in text.splitlines()]
Out[3]:
[('first', 'special', '34-37'),
('second', 'special', '22-23'),
('red', 'special', '55'),
('another', 'special', '98'),
('another', 'special', '107'),
('green', 'special', '55-59'),
('blue', 'special', '31-39')]
In Excel, you can using a formula to extract text between two words by doing as follow: 在Excel中,您可以使用公式通过执行以下操作在两个单词之间提取文本:
Select a blank cell and type this formula =MID(A1,SEARCH("KTE",A1)+3,SEARCH("feature",A1)-SEARCH("KTE",A1)-4) into it , then press Enter button. 选择一个空白单元格并输入此公式= MID(A1,SEARCH(“KTE”,A1)+ 3,SEARCH(“feature”,A1)-SEARCH(“KTE”,A1)-4),然后按Enter键按钮。
Drag the fill handle to fill the range you want to apply this formula. 拖动填充柄以填充要应用此公式的范围。 Now the text strings between "KTE" and "feature" are extracted only. 现在只提取“KTE”和“feature”之间的文本字符串。
Notes: 笔记:
In this formula, A1 is the cell you want to extract text from. 在此公式中,A1是要从中提取文本的单元格。
KTE and feature are the words you want to extract text between. KTE和功能是您要在其间提取文本的单词。
The number 3 is the characters length of KTE, and number 4 is equal to the characters length of KTE plus one. 数字3是KTE的字符长度,数字4等于KTE的字符长度加1。
In addition what @RolandSmith wrote, here is a way of using Regular Expressions in Excel - VBA 另外@RolandSmith写道,这是一种在Excel中使用正则表达式的方法 - VBA
Option Explicit
Function ExtractSpecial(S As String, Index As Long) As String
Dim RE As Object, MC As Object
Const sPat As String = "([a-z]+)\s+(special)\s+([^a-z]+)"
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.ignorecase = True
.MultiLine = False
.Pattern = sPat
If .test(S) = True Then
Set MC = .Execute(S)
ExtractSpecial = MC(0).submatches(Index - 1)
End If
End With
End Function
The Index
argument in this UDF corresponds to returning either the 1st, 2nd or 3rd submatch from the match collection, so you can easily split the original string into your three desired components. 此UDF中的Index
参数对应于从匹配集合返回第1,第2或第3个子匹配,因此您可以轻松地将原始字符串拆分为三个所需的组件。
Since you write you have "thousands of lines", you may prefer to run a macro. 既然你写了“数千行”,你可能更喜欢运行一个宏。 The macro will process the data much more quickly, but is not dynamic. 宏将更快地处理数据,但不是动态的。 The macro below assumes your original data is in Column A on Sheet2, and will put the results in columns C:E on the same worksheet. 下面的宏假设您的原始数据位于Sheet2上的A列中,并将结果放在同一工作表上的C:E列中。 You can easily change these parameters: 您可以轻松更改这些参数:
Sub ExtractSpec()
Dim RE As Object, MC As Object
Dim wsSrc As Worksheet, wsRes As Worksheet, rRes As Range
Dim vSrc As Variant, vRes As Variant
Dim I As Long
Set wsSrc = Worksheets("sheet2")
Set wsRes = Worksheets("sheet2")
Set rRes = wsRes.Cells(1, 3)
With wsSrc
vSrc = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp))
End With
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.MultiLine = False
.ignorecase = True
.Pattern = "([a-z]+)\s+(special)\s+([^a-z]+)"
ReDim vRes(1 To UBound(vSrc), 1 To 3)
For I = 1 To UBound(vSrc)
If .test(vSrc(I, 1)) = True Then
Set MC = .Execute(vSrc(I, 1))
vRes(I, 1) = MC(0).submatches(0)
vRes(I, 2) = MC(0).submatches(1)
vRes(I, 3) = MC(0).submatches(2)
End If
Next I
End With
Set rRes = rRes.Resize(UBound(vRes, 1), UBound(vRes, 2))
With rRes
.EntireColumn.Clear
.Value = vRes
.EntireColumn.AutoFit
End With
End Sub
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.