简体   繁体   English

具有2个分隔符和不同记录类型的C#自定义文件解析

[英]C# custom file parsing with 2 delimiters and different record types

I have a (not quite valid) CSV file that contains rows of multiple types. 我有一个(不是很有效)CSV文件,其中包含多种类型的行。 Any record could be one of about 6 different types and each type has a different number of properties. 任何记录可能是大约6种不同类型之一,并且每种类型都有不同数量的属性。 The first part of any row contains the timestamp and the type of record, followed by a standard CSV of the data. 任何行的第一部分都包含时间戳记和记录类型,后跟数据的标准CSV。

Example

1456057920 PERSON, Ted Danson, 123 Fake Street, 555-123-3214, blah
1476195120 PLACE, Detroit, Michigan, 12345
1440581532 THING, Bucket, Has holes, Not a good bucket

And to make matters more complex, I need to be able to do different things with the records depending on certain criteria. 为了使事情更复杂,我需要能够根据某些条件对记录进行不同的处理。 So a PERSON type can be automatically inserted into a DB without user input, but a THING type would be displayed on screen for the user to review and approve before adding to DB and continuing the parse, etc. 因此,可以将PERSON类型自动插入到DB中,而无需用户输入,但是THING类型将显示在屏幕上,供用户检查和批准,然后再添加到DB中并继续进行解析等。

Normally, I would use a library like CsvHelper to map the records to a type, but in this case since the types could be different, and the first part uses a space instead of comma, I dont know how to do that with a standard CSV library. 通常,我会使用CsvHelper之类的库将记录映射到类型,但是在这种情况下,由于类型可能不同,并且第一部分使用空格而不是逗号,因此我不知道如何使用标准CSV来做到这一点。图书馆。 So currently how I am doing it each loop is: 所以目前我每个循环的操作方式是:

  1. String split based off comma. 根据逗号分割字符串。
  2. Split the first array item by the space. 用空格分割第一个数组项。
  3. Use a switch statement to determine the type and create the object. 使用switch语句确定类型并创建对象。
  4. Put that object into a List of type object. 将该对象放入类型对象的列表。
  5. Get confused as to where to go now because i now have a list of various types and will have to use yet another switch or if to determine the next parts. 现在我要去哪里弄糊涂了,因为我现在有各种类型的清单,将不得不使用另一个开关或是否要确定下一个零件。

I don't really know for sure if I will actually need that List but I have a feeling the user will want the ability to manually flip through records in the file. 我不确定我是否真的需要该List,但是我感觉用户会希望能够手动浏览文件中的记录。

By this point, this is starting to make for very long, confusing code, and my gut feeling tells me there has to be a cleaner way to do this. 至此,这已经开始制作很长的,令人困惑的代码,而我的直觉告诉我,必须有一种更清洁的方法来执行此操作。 I thought maybe using Type.GetType(string) would help simplify the code some, but this seems like it might be terribly inefficient in a loop with 10k+ records and might make things even more confusing. 我以为使用Type.GetType(string)可能会简化一些代码,但这似乎在10k +条记录的循环中可能效率极低,并且可能使事情更加混乱。 I then thought maybe making some interfaces might help, but I'm not the greatest at using interfaces in this context and I seem to end up in about this same situation. 然后,我认为也许创建一些接口可能会有所帮助,但是我并不是最擅长在这种情况下使用接口,而且我似乎最终会遇到同样的情况。

So what would be a more manageable way to parse this file? 那么解析此文件的一种更易管理的方法是什么? Are there any C# parsing libraries out there that would be able to handle something like this? 是否有任何C#解析库都可以处理类似的事情?

You can implement an IRecord interface that has a Timestamp property and a Process method (perhaps others as well). 您可以实现具有Timestamp属性和Process方法(也可能还有其他方法)的IRecord接口。 Then, implement concrete types for each type of record. 然后,为每种记录类型实现具体类型。

  1. Use a switch statement to determine the type and create and populate the correct concrete type. 使用switch语句确定类型,并创建并填充正确的具体类型。

  2. Place each object in a List 将每个对象放在列表中

After that you can do whatever you need. 之后,您可以做任何您需要的事情。 Some examples: 一些例子:

Loop through each item and call Process() to handle it. 遍历每个项目并调用Process()进行处理。

Use linq .OfType<{concrete type}> to segment the list. 使用linq .OfType<{concrete type}>分割列表。 (Warning with 10k records, this would be slow since it would traverse the entire list for each concrete type.) (警告1万条记录会很慢,因为它将遍历每种具体类型的整个列表。)

Use an overridden ToString method to give a single text representation of the IRecord 使用重写的ToString方法给出IRecord的单个文本表示IRecord

If using WPF, you can define a datatype template for each concrete type, bind an ItemsControl derivative to a collection of IRecord s and your "detail" display (eg ListItem or separate ContentControl ) will automagically display the item using the correct DataTemplate 如果使用WPF,则可以为每种具体类型定义一个数据类型模板,将ItemsControl派生绑定到IRecord的集合,并且您的“详细信息”显示(例如ListItem或单独的ContentControl )将使用正确的DataTemplate自动显示该项目

Continuing in my comment - well that depends. 继续我的评论-取决于。 What u described is actually pretty good for starters, u can of course expand it to a series of factories one for each object type - so that you move from explicit switch into searching for first factory that can parse a line. 您所描述的内容实际上对于初学者来说非常不错,您当然可以将其扩展为一系列针对每种对象类型的工厂-这样您就可以从显式开关切换到搜索可以解析行的第一个工厂。 Might prove useful if u are looking to adding more object types in the future - you just add then another factory for new kind of object. 如果您将来打算添加更多的对象类型,可能会证明很有用-您只需添加另一个工厂即可用于新的对象。 Up to you if these objects should share a common interface. 这些对象应共享一个公共接口,由您自己决定。 Interface is used generally to define aa behavior, so it doesn't seem so. 接口通常用于定义行为,因此似乎并非如此。 Maybe you should rather just a Dictionary? 也许您应该宁愿只是字典? You need to ask urself if you actually need strongly typed objects here? 您需要问自己是否真的需要强类型对象? Maybe what you need is a simple class with ObjectType property and Dictionary of properties with some helper methods for easy typed properties access like GetBool, GetInt or generic Get? 也许您需要的是一个简单的类,该类具有ObjectType属性和Dictionary属性,并带有一些用于轻松键入属性访问的帮助方法,例如GetBool,GetInt或通用Get?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM