简体   繁体   English

将来自特定列的输入解析为单独的列的最有效方法是什么?

[英]What would be the most efficient way to parse the input from a specific column into separate columns?

I have a CSV file with a specific column Message with the following input inside of it that I would like to separate out properly. 我有一个带有特定列Message的CSV文件,其中包含以下输入,我想正确地将其分开。 Please be aware that that snippet below does not look like this in Excel where I currently need it to be formatted for 请注意,下面的代码段在Excel中看起来不是这样,我目前需要对其进行格式化

    ["CorrelationId: b99fb632-78cf-4910-ab23-4f69833ed2d9
Request for API: /api/acmsxdsreader/readpolicyfrompolicyassignment Caller:C2F023C52E2148C9C1D040FBFAC113D463A368B1 RequestedSchemas: {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}VoicePolicy, {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}OnlineVoiceRoutingPolicy,  TenantId: 7a205197-8e59-487d-b9fa-3fc1b108f1e5"]

I would like to separate it out so that it will look like this.(The names of the column will be before the colon and the information inside of it will be what is after the colon.) 我想将其分离出来,使其看起来像这样(列的名称将在冒号之前,而其内部的信息将在冒号之后)。

CorrelationID: b99fb632-78cf-4910-ab23-4f69833ed2d9
Request for API: 
/api/acmsxdsreader/readpolicyfrompolicyassignment
Caller:C2F023C52E2148C9C1D040FBFAC113D463A368B1
RequestedSchemas: {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}VoicePolicy, {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}OnlineVoiceRoutingPolicy,
TenantId: 7a205197-8e59-487d-b9fa-3fc1b108f1e5[![enter image description here]

I have tried using the text-to-columns but in Excel but it does not come out correctly 我已经尝试过使用文本到列,但是在Excel中却无法正确显示

What I would like to know what is the best way to do this is. 我想知道的最佳方法是。 I am currently making a program in C# to try and parse it out properly but what I have does not work correctly. 我目前正在用C#编写一个程序,尝试对其进行正确解析,但是我所拥有的无法正常工作。

For any reference here is my C# code. 对于任何参考,这里是我的C#代码。 However I am open to any way of doing this. 但是,我愿意采取任何方式。

static void Main(string[] args)
    {
        using (TextFieldParser parser = new TextFieldParser(@"C:\Users\t-maucal\Desktop\MachineLearningTestSets\CSVParse.csv"))
        {
            parser.TextFieldType = FieldType.Delimited;
            parser.SetDelimiters(" ");
            while (!parser.EndOfData)
            {
                //Process row
                string[] fields = parser.ReadFields();
                foreach (string field in fields)
                {
                    Console.WriteLine(field);
                }
            }
        }
    }

信息栏原始格式 预期结果

Using formulas, @cybernetic.nomad had it most of the way there. 使用公式,@ cybernetic.nomad大部分使用该方法。 In order to remove the titles from the data you can try these: 为了从数据中删除标题,您可以尝试以下操作:

  1. Put the categories of each column (CorrelationId:, Request for API:) in cells B1:G1 将每个列的类别(CorrelationId :,请求API :)放在单元格B1:G1

  2. in B2 , use the following formula: B2 ,使用以下公式:

     =RIGHT(LEFT($A2,FIND(C$1,$A2)-1),LEN(LEFT($A2,FIND(C$1,$A2)-1))-(LEN(B1)+2)) 
  3. in C2 , use the following formula: C2 ,使用以下公式:

     =RIGHT(MID($A2,FIND(C$1,$A2),FIND(D$1,$A2,FIND(C$1,$A2))-FIND(C$1,$A2)),LEN(MID($A2,FIND(C$1,$A2),FIND(D$1,$A2,FIND(C$1,$A2))-FIND(C$1,$A2)))-(LEN(C1)+1)) 
  4. in D2 , use the following formula: D2 ,使用以下公式:

     =RIGHT(MID($A2,FIND(D$1,$A2),FIND(E$1,$A2,FIND(D$1,$A2))-FIND(D$1,$A2)),LEN(MID($A2,FIND(D$1,$A2),FIND(E$1,$A2,FIND(D$1,$A2))-FIND(D$1,$A2)))-(LEN(D1)+2)) 
  5. in E2 , use the following formula: E2 ,使用以下公式:

     =RIGHT(MID($A2,FIND(E$1,$A2),FIND(F$1,$A2,FIND(E$1,$A2))-FIND(E$1,$A2,FIND(D$1,$A2))-1),LEN(MID($A2,FIND(E$1,$A2),FIND(F$1,$A2,FIND(E$1,$A2))-FIND(E$1,$A2,FIND(D$1,$A2))))-(LEN(E1)+2)) 
  6. in F2 , use the following formula: F2 ,使用以下公式:

     =RIGHT(MID($A2,FIND(F$1,$A2),FIND(G$1,$A2,FIND(F$1,$A2))-FIND(F$1,$A2)),LEN(MID($A2,FIND(F$1,$A2),FIND(G$1,$A2,FIND(F$1,$A2))-FIND(F$1,$A2)))-(LEN(F1)+2)) 
  7. in G2 , use the following formula: G2 ,使用以下公式:

     =RIGHT($A2,LEN($A2)-FIND(G$1,$A2)-LEN(G1)) 

    在此处输入图片说明

You can use a Macro written in VBA. 您可以使用用VBA编写的宏。

I created a Class and renamed it cData with properties of your different column headings. 我创建了一个类,并使用不同列标题的属性将其重命名为cData

Then I used Regular Expressions to separate the different properties from the data you provided, collected it into a Dictionary, and output the results to a separate worksheet in a specified order. 然后,我使用正则表达式从您提供的数据中分离出不同的属性,将其收集到Dictionary中,然后将结果按指定顺序输出到单独的工作表中。

I assumed that your named column headers were the information you are looking for, and, as in your text example, there is only a single instance of each category to be concerned with. 我假定您的命名列标题是您要查找的信息,并且,如您的文本示例一样,每个类别仅涉及一个实例。

I also assumed that your data starts in B1 . 我还假定您的数据从B1开始。

Read the notes closely in the macro. 仔细阅读宏中的注释。

Be sure to set the references as indicated in the Regular Module Code. 确保按照常规模块代码中的指示设置参考。

Class Module 类模块

'Rename this Module **cData**
Option Explicit
Private pCorrelationID As String
Private pRequestForApi As String
Private pCaller As String
Private pRequestedSchemas As String
Private pTenantID As String

Public Property Get CorrelationID() As String
    CorrelationID = pCorrelationID
End Property
Public Property Let CorrelationID(Value As String)
    pCorrelationID = Value
End Property

Public Property Get RequestForApi() As String
    RequestForApi = pRequestForApi
End Property
Public Property Let RequestForApi(Value As String)
    pRequestForApi = Value
End Property

Public Property Get Caller() As String
    Caller = pCaller
End Property
Public Property Let Caller(Value As String)
    pCaller = Value
End Property

Public Property Get RequestedSchemas() As String
    RequestedSchemas = pRequestedSchemas
End Property
Public Property Let RequestedSchemas(Value As String)
    pRequestedSchemas = Value
End Property

Public Property Get TenantID() As String
    TenantID = pTenantID
End Property
Public Property Let TenantID(Value As String)
    pTenantID = Value
End Property

Regular Module 常规模块

'Set Reference to Microsoft Scripting Runtime
'Set Reference to Microsoft VBScript Regular Expressions 5.5
Option Explicit
Sub ttcSpecial()
    Dim wsSrc As Worksheet, wsRes As Worksheet
    Dim vSrc As Variant, vRes As Variant
    Dim rRes As Range
    Dim dD As Dictionary
    Dim RE As RegExp, MC As MatchCollection, M As Match
    Dim cD As cData
    Dim myKey, I As Long, sTemp As String

Set wsSrc = Worksheets("sheet1")
Set wsRes = Worksheets("sheet2")
    Set rRes = wsRes.Cells(1, 1)

With wsSrc
    vSrc = .Range(.Cells(1, 2), .Cells(.Rows.Count, 2).End(xlUp))
    If Not IsArray(vSrc) Then
        sTemp = vSrc
        ReDim vSrc(1 To 1, 1 To 1)
        vSrc(1, 1) = sTemp
    End If
End With

Set RE = New RegExp
With RE
    .Global = True
    .IgnoreCase = True
    .MultiLine = False
    .Pattern = "((?:CorrelationID|Request For API|Caller|RequestedSchemas|TenantID)):([\s\S]+?)(?=(?:CorrelationID|Request For API|Caller|RequestedSchemas|TenantID|$))"
End With


Set dD = New Dictionary
    dD.CompareMode = TextCompare

For I = 1 To UBound(vSrc, 1)
    Set cD = New cData
    With cD
    If RE.Test(vSrc(I, 1)) = True Then
        myKey = I
        Set MC = RE.Execute(vSrc(I, 1))
        For Each M In MC
            Select Case M.SubMatches(0)
                Case "CorrelationID"
                    .CorrelationID = M.SubMatches(1)
                Case "Request for API"
                    .RequestForApi = M.SubMatches(1)
                Case "Caller"
                    .Caller = M.SubMatches(1)
                Case "RequestedSchemas"
                    .RequestedSchemas = M.SubMatches(1)
                Case "TenantID"
                    .TenantID = M.SubMatches(1)
            End Select
        Next M

        dD.Add Key:=myKey, Item:=cD
    End If
    End With
Next I

ReDim vRes(0 To dD.Count, 1 To 5)

'Headers
    vRes(0, 1) = "Correlation ID"
    vRes(0, 2) = "Request for API"
    vRes(0, 3) = "Caller"
    vRes(0, 4) = "Requested Schemas"
    vRes(0, 5) = "Tenant ID"

I = 0
For Each myKey In dD.Keys
    I = I + 1
    With dD(myKey)
        vRes(I, 1) = .CorrelationID
        vRes(I, 2) = .RequestForApi
        vRes(I, 3) = .Caller
        vRes(I, 4) = .RequestedSchemas
        vRes(I, 5) = .TenantID
    End With
Next myKey

Set rRes = rRes.Resize(UBound(vRes, 1) + 1, UBound(vRes, 2))
With rRes
    .EntireColumn.Clear
    .Value = vRes
    With .Rows(1)
        .Font.Bold = True
        .HorizontalAlignment = xlCenter
    End With
    .EntireColumn.AutoFit
End With

End Sub

Results from text sample in original question 来自原始问题的文本样本的结果

在此处输入图片说明

The Regex simplified explanation 正则表达式的 简化说明

  • Match any of the Column Headers 匹配任何列标题
  • Match everything that starts after the Colon 匹配冒号之后开始的所有内容
    • up to but NOT including another Column Header, or the end of the string 最多但不包括其他列标题或字符串的末尾

That's a lot of error-prone work. 这是很多容易出错的工作。 Just use CSVHelper by Josh Close. 只需使用Josh Close的CSVHelper It's an excellent package that's fast and easy to use. 这是一个快速且易于使用的优秀软件包。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM