简体   繁体   English

将文本解析为键/值对或json

[英]parse text into key/value pair or json

I have text in the following format, I was wondering what the best approach might be to create a user object from it with the fields as its properties. 我的文本格式如下,我想知道最好的方法是从字段中将其作为属性从中创建用户对象。

I dont know regular expressions that well and i was looking at the string methods in csharp particularly IndexOf and LastIndexOf, but i think that would be too messy as there are approximately 15 fields. 我不太了解正则表达式,我在看csharp中的字符串方法,尤其是IndexOf和LastIndexOf,但是我认为这太混乱了,因为大约有15个字段。

I am trying to do this in c sharp 我正在尝试使用C Sharp

Some characteristics: 一些特点:

  1. The keys/fields are fixed and known beforehand, so i know that i have to look for things like title, company etc 键/字段是固定的,并且事先已知,因此我知道我必须查找标题,公司等内容
  2. The address part is single valued and following that there's some multi-valued fields 地址部分是单值的,其后是一些多值字段
  3. The multi-valued field may/maynot end with a comma (,) 多值字段可以/可以不以逗号(,)结尾
  4. There is one or two line brakes between the fields eg "country" is followed by 2 line brakes before we encounter "interest" 场之间存在一两个线制动器,例如在遇到“兴趣”之前,“国家”之后是两个线制动器
Title: Mr
    Company: abc capital
    Address1: 42 mystery lane
    Zip: 112312
    Country: Ireland
    Interest: Biking, Swimming, Hiking,
    Topic of Interest: Europe, Asia, Capital

This will split the the data up into key value pairs and store them in a dictionary. 这会将数据拆分为键值对,并将其存储在字典中。 You may have to modify further for more requirements. 您可能需要进一步修改才能满足更多要求。

var dictionary = data
        .Split(
            new[] {"\r\n"}, 
            StringSplitOptions.RemoveEmptyEntries)
        .Select(x => x.Split(':'))
        .ToDictionary(
            k => k[0].Trim(), 
            v => v[1].Trim());

I'd probably go with something like this: 我可能会选择这样的东西:

    private Dictionary<string, IEnumerable<string>> ParseValues(string providedValues)
    {
        Dictionary<string, IEnumerable<string>> parsedValues = new Dictionary<string, IEnumerable<string>>();

        string[] lines = providedValues.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries); //Your newline character here might differ, being '\r', '\n', '\r\n'...

        foreach (string line in lines)
        {
            string[] lineSplit = line.Split(':');
            string key = lineSplit[0].Trim();
            IEnumerable<string> values = lineSplit[1].Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries).Select(x => x.Trim()); //Removing empty entries here will ensure you don't get an empty for the "Interest" line, where you have 'Hiking' followed by a comma, followed by nothing else
            parsedValues.Add(key, values);
        }

        return parsedValues;
    }

or if you subscribe to the notion that readability and maintainability are not as cool as a great big chain of calls: 或者,如果您赞成可读性和可维护性不如大型调用链那么酷的概念:

    private static Dictionary<string, IEnumerable<string>> ParseValues(string providedValues)
    {
        return providedValues.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Select(x => x.Split(':')).ToDictionary(key => key[0].Trim(), value => value[1].Split(new char[]{ ','}, StringSplitOptions.RemoveEmptyEntries).Select(x => x.Trim()));
    }

I strongly recomend getting more familiar wit regexp for those cases. 对于这些情况,我强烈建议您更熟悉机智的正则表达式。 Parsing "half" structured text is very easy and logic with regular exp. 解析“半”结构化文本非常容易,并且具有常规exp的逻辑。

for ex. 对于前。 this (and other following are just variants there are many ways to do it depending on what you need) 这个(以及下面的其他只是变体,根据您的需要,有很多方法可以做到)

title:\s*(.*)\s+comp.*?:\s*(.*)\s+addr.*?:\s*(.*)\s+zip:\s*(.*)\s+country:\s*(.*)\s+inter.*?:\s*(.*)\s+topic.*?:\s*(.*)

gives result 给出结果

1.  Mr
2.  abc capital
3.  42 mystery lane
4.  112312
5.  Ireland
6.  Biking, Swimming, Hiking,
7.  Europe, Asia, Capital

or - more open to anything: 或-对任何事物都更开放:

\s(.*?):\s(.*)

parses your input into nice groups like this: 将您的输入解析成不错的组,如下所示:

Match 1
1.  Title
2.  Mr
Match 2
1.  Company
2.  abc capital
Match 3
1.  Address1
2.  42 mystery lane
Match 4
1.  Zip
2.  112312
Match 5
1.  Country
2.  Ireland
Match 6
1.  Interest
2.  Biking, Swimming, Hiking,
Match 7
1.  Topic of Interest
2.  Europe, Asia, Capital

I am not familiar with c# (and its dialect of regexp), I just wanted do awake your interest ... 我不熟悉c#(及其正则表达式的方言),我只是想唤醒您的兴趣...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM