[英]What would be the most efficient way to parse the input from a specific column into separate columns?
I have a CSV file with a specific column Message with the following input inside of it that I would like to separate out properly. 我有一个带有特定列Message的CSV文件,其中包含以下输入,我想正确地将其分开。 Please be aware that that snippet below does not look like this in Excel where I currently need it to be formatted for 请注意,下面的代码段在Excel中看起来不是这样,我目前需要对其进行格式化
["CorrelationId: b99fb632-78cf-4910-ab23-4f69833ed2d9
Request for API: /api/acmsxdsreader/readpolicyfrompolicyassignment Caller:C2F023C52E2148C9C1D040FBFAC113D463A368B1 RequestedSchemas: {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}VoicePolicy, {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}OnlineVoiceRoutingPolicy, TenantId: 7a205197-8e59-487d-b9fa-3fc1b108f1e5"]
I would like to separate it out so that it will look like this.(The names of the column will be before the colon and the information inside of it will be what is after the colon.) 我想将其分离出来,使其看起来像这样(列的名称将在冒号之前,而其内部的信息将在冒号之后)。
CorrelationID: b99fb632-78cf-4910-ab23-4f69833ed2d9
Request for API:
/api/acmsxdsreader/readpolicyfrompolicyassignment
Caller:C2F023C52E2148C9C1D040FBFAC113D463A368B1
RequestedSchemas: {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}VoicePolicy, {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}OnlineVoiceRoutingPolicy,
TenantId: 7a205197-8e59-487d-b9fa-3fc1b108f1e5[![enter image description here]
I have tried using the text-to-columns but in Excel but it does not come out correctly 我已经尝试过使用文本到列,但是在Excel中却无法正确显示
What I would like to know what is the best way to do this is. 我想知道的最佳方法是。 I am currently making a program in C# to try and parse it out properly but what I have does not work correctly. 我目前正在用C#编写一个程序,尝试对其进行正确解析,但是我所拥有的无法正常工作。
For any reference here is my C# code. 对于任何参考,这里是我的C#代码。 However I am open to any way of doing this. 但是,我愿意采取任何方式。
static void Main(string[] args)
{
using (TextFieldParser parser = new TextFieldParser(@"C:\Users\t-maucal\Desktop\MachineLearningTestSets\CSVParse.csv"))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(" ");
while (!parser.EndOfData)
{
//Process row
string[] fields = parser.ReadFields();
foreach (string field in fields)
{
Console.WriteLine(field);
}
}
}
}
Using formulas, @cybernetic.nomad had it most of the way there. 使用公式,@ cybernetic.nomad大部分使用该方法。 In order to remove the titles from the data you can try these: 为了从数据中删除标题,您可以尝试以下操作:
Put the categories of each column (CorrelationId:, Request for API:) in cells B1:G1
将每个列的类别(CorrelationId :,请求API :)放在单元格B1:G1
in B2
, use the following formula: 在B2
,使用以下公式:
=RIGHT(LEFT($A2,FIND(C$1,$A2)-1),LEN(LEFT($A2,FIND(C$1,$A2)-1))-(LEN(B1)+2))
in C2
, use the following formula: 在C2
,使用以下公式:
=RIGHT(MID($A2,FIND(C$1,$A2),FIND(D$1,$A2,FIND(C$1,$A2))-FIND(C$1,$A2)),LEN(MID($A2,FIND(C$1,$A2),FIND(D$1,$A2,FIND(C$1,$A2))-FIND(C$1,$A2)))-(LEN(C1)+1))
in D2
, use the following formula: 在D2
,使用以下公式:
=RIGHT(MID($A2,FIND(D$1,$A2),FIND(E$1,$A2,FIND(D$1,$A2))-FIND(D$1,$A2)),LEN(MID($A2,FIND(D$1,$A2),FIND(E$1,$A2,FIND(D$1,$A2))-FIND(D$1,$A2)))-(LEN(D1)+2))
in E2
, use the following formula: 在E2
,使用以下公式:
=RIGHT(MID($A2,FIND(E$1,$A2),FIND(F$1,$A2,FIND(E$1,$A2))-FIND(E$1,$A2,FIND(D$1,$A2))-1),LEN(MID($A2,FIND(E$1,$A2),FIND(F$1,$A2,FIND(E$1,$A2))-FIND(E$1,$A2,FIND(D$1,$A2))))-(LEN(E1)+2))
in F2
, use the following formula: 在F2
,使用以下公式:
=RIGHT(MID($A2,FIND(F$1,$A2),FIND(G$1,$A2,FIND(F$1,$A2))-FIND(F$1,$A2)),LEN(MID($A2,FIND(F$1,$A2),FIND(G$1,$A2,FIND(F$1,$A2))-FIND(F$1,$A2)))-(LEN(F1)+2))
in G2
, use the following formula: 在G2
,使用以下公式:
=RIGHT($A2,LEN($A2)-FIND(G$1,$A2)-LEN(G1))
You can use a Macro written in VBA. 您可以使用用VBA编写的宏。
I created a Class and renamed it cData
with properties of your different column headings. 我创建了一个类,并使用不同列标题的属性将其重命名为cData
。
Then I used Regular Expressions to separate the different properties from the data you provided, collected it into a Dictionary, and output the results to a separate worksheet in a specified order. 然后,我使用正则表达式从您提供的数据中分离出不同的属性,将其收集到Dictionary中,然后将结果按指定顺序输出到单独的工作表中。
I assumed that your named column headers were the information you are looking for, and, as in your text example, there is only a single instance of each category to be concerned with. 我假定您的命名列标题是您要查找的信息,并且,如您的文本示例一样,每个类别仅涉及一个实例。
I also assumed that your data starts in B1
. 我还假定您的数据从B1
开始。
Read the notes closely in the macro. 仔细阅读宏中的注释。
Be sure to set the references as indicated in the Regular Module Code. 确保按照常规模块代码中的指示设置参考。
Class Module 类模块
'Rename this Module **cData**
Option Explicit
Private pCorrelationID As String
Private pRequestForApi As String
Private pCaller As String
Private pRequestedSchemas As String
Private pTenantID As String
Public Property Get CorrelationID() As String
CorrelationID = pCorrelationID
End Property
Public Property Let CorrelationID(Value As String)
pCorrelationID = Value
End Property
Public Property Get RequestForApi() As String
RequestForApi = pRequestForApi
End Property
Public Property Let RequestForApi(Value As String)
pRequestForApi = Value
End Property
Public Property Get Caller() As String
Caller = pCaller
End Property
Public Property Let Caller(Value As String)
pCaller = Value
End Property
Public Property Get RequestedSchemas() As String
RequestedSchemas = pRequestedSchemas
End Property
Public Property Let RequestedSchemas(Value As String)
pRequestedSchemas = Value
End Property
Public Property Get TenantID() As String
TenantID = pTenantID
End Property
Public Property Let TenantID(Value As String)
pTenantID = Value
End Property
Regular Module 常规模块
'Set Reference to Microsoft Scripting Runtime
'Set Reference to Microsoft VBScript Regular Expressions 5.5
Option Explicit
Sub ttcSpecial()
Dim wsSrc As Worksheet, wsRes As Worksheet
Dim vSrc As Variant, vRes As Variant
Dim rRes As Range
Dim dD As Dictionary
Dim RE As RegExp, MC As MatchCollection, M As Match
Dim cD As cData
Dim myKey, I As Long, sTemp As String
Set wsSrc = Worksheets("sheet1")
Set wsRes = Worksheets("sheet2")
Set rRes = wsRes.Cells(1, 1)
With wsSrc
vSrc = .Range(.Cells(1, 2), .Cells(.Rows.Count, 2).End(xlUp))
If Not IsArray(vSrc) Then
sTemp = vSrc
ReDim vSrc(1 To 1, 1 To 1)
vSrc(1, 1) = sTemp
End If
End With
Set RE = New RegExp
With RE
.Global = True
.IgnoreCase = True
.MultiLine = False
.Pattern = "((?:CorrelationID|Request For API|Caller|RequestedSchemas|TenantID)):([\s\S]+?)(?=(?:CorrelationID|Request For API|Caller|RequestedSchemas|TenantID|$))"
End With
Set dD = New Dictionary
dD.CompareMode = TextCompare
For I = 1 To UBound(vSrc, 1)
Set cD = New cData
With cD
If RE.Test(vSrc(I, 1)) = True Then
myKey = I
Set MC = RE.Execute(vSrc(I, 1))
For Each M In MC
Select Case M.SubMatches(0)
Case "CorrelationID"
.CorrelationID = M.SubMatches(1)
Case "Request for API"
.RequestForApi = M.SubMatches(1)
Case "Caller"
.Caller = M.SubMatches(1)
Case "RequestedSchemas"
.RequestedSchemas = M.SubMatches(1)
Case "TenantID"
.TenantID = M.SubMatches(1)
End Select
Next M
dD.Add Key:=myKey, Item:=cD
End If
End With
Next I
ReDim vRes(0 To dD.Count, 1 To 5)
'Headers
vRes(0, 1) = "Correlation ID"
vRes(0, 2) = "Request for API"
vRes(0, 3) = "Caller"
vRes(0, 4) = "Requested Schemas"
vRes(0, 5) = "Tenant ID"
I = 0
For Each myKey In dD.Keys
I = I + 1
With dD(myKey)
vRes(I, 1) = .CorrelationID
vRes(I, 2) = .RequestForApi
vRes(I, 3) = .Caller
vRes(I, 4) = .RequestedSchemas
vRes(I, 5) = .TenantID
End With
Next myKey
Set rRes = rRes.Resize(UBound(vRes, 1) + 1, UBound(vRes, 2))
With rRes
.EntireColumn.Clear
.Value = vRes
With .Rows(1)
.Font.Bold = True
.HorizontalAlignment = xlCenter
End With
.EntireColumn.AutoFit
End With
End Sub
Results from text sample in original question 来自原始问题的文本样本的结果
The Regex simplified explanation 正则表达式的 简化说明
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.