The basic question is how can I reference multiple contiguous columns in a single row within a DataTable as a two-dimensional array which can be processed with For-Next structures? Here's the background:
The program in question loads data from a .csv file where each line/row contains basic identity information about a person, followed by their numeric answers to two dozen questions. The program cycles through each line of the .csv file and identifies the five other lines which have the highest number of exact answer matches to the current line.
A DataTable seems to be the best structure to read the .csv file into, but am not sure how to reference the last x columns of each row as an array of the form answer(person,question).
In case this seems either really easy or totally impractical, I should make the following disclaimer: The program code is already written and working, but I'm in the process of re-coding it from QuickBASIC 4 (yes, I did say QB4...) to VB.NET. The program is basically a dating program and I've been running it once a year for the last 20 years or so with the local school selling the matches as a fundraiser. It's gotten to the point where neither Windows 7 nor the latest patched version of Windows XP will run QB4, so I downloaded VS Express for Desktop and am using this as an opportunity to learn VB.NET. I've done a lot of (non-windowed) VBScript application scripting, but some really light dabbling into VB6 is my only experience with traditional VB. As everyone here is already well aware, the file I/O in .NET is very different than VB6 or prior. That's what I'm fighting now…
.....
To answer Zohar's question/comment:
Below is a sample of the .csv file format. The actual file is several hundred lines long, but all identical in form. Names and phone numbers have been changed for privacy. The fields are, in order:
LastName
FirstName
Phone# (if given, placeholders if not)
Sex (1=M;2=F)
Answer to Question 1 (1-4)
Answer to Question 2 (1-4)
....
Answer to Question 24 (1-4)
Mouse,Mickey,xxx-xxxx,1,2,3,3,2,3,1,3,4,2,1,4,3,1,1,2,1,2,1,1,1,2,1,1,4
Mouse,Minnie,555-9931,2,1,3,1,2,1,2,3,3,3,4,4,2,4,1,2,3,4,4,4,1,2,1,1,4
Duck,Donald,555-7024,1,2,3,4,2,4,3,4,2,2,1,4,2,4,1,2,1,1,2,1,3,2,1,1,1
McDuck,Scrooge,555-4824,1,2,3,3,2,1,2,4,3,2,4,4,2,4,1,4,2,2,4,4,3,2,1,1,4
GoodWitch,Wendy,xxx-xxxx,2,2,2,4,2,1,2,4,4,3,4,2,2,1,1,2,1,1,4,4,4,4,1,3,1
The reason for the two-dimensional array is to create a single-variable database of answers by user and question number. See below for the sorting portion of the actual existing QB4 code. The two-dimensional array below that I'm trying to bring to VB.NET standards is StudentAnswer(matchFrom, question).
For matchFrom = 1 To numberSheets
'
'The following section of code finds the top maximumToMatch groups of n
'matching questions per sheet
'
For x = 1 To maximumToMatch
topMatches(x) = 0
sheetsMatched(x) = 0
Next x
For matchTo = 1 To numberSheets
If StudentSex(matchFrom) <> StudentSex(matchTo) Then
numberMatched(matchTo) = 0
highMatch = 0
For question = 1 To numberQuestions
If StudentAnswer(matchFrom, question) = StudentAnswer(matchTo, question) Then
numberMatched(matchTo) = numberMatched(matchTo) + 1
End If
Next question
If numberMatched(matchTo) = topMatches(1) Then
sheetsMatched(1) = sheetsMatched(1) + 1
End If
If numberMatched(matchTo) > topMatches(1) Then
match = maximumToMatch
done = False
Do
If numberMatched(matchTo) = topMatches(match) Then
sheetsMatched(match) = sheetsMatched(match) + 1
done = True
End If
If numberMatched(matchTo) > topMatches(match) Then
For x = 1 To match - 1
topMatches(x) = topMatches(x + 1)
sheetsMatched(x) = sheetsMatched(x + 1)
Next x
topMatches(match) = numberMatched(matchTo)
sheetsMatched(match) = 1
done = True
Else
match = match - 1
End If
Loop Until done
End If
Else
numberMatched(matchTo) = 0
End If
Next matchTo
...
<additional code to narrow it down to a fixed number of sheet matches>
Next matchFrom
And to anticipate two other likely questions:
The existing code is written to match M to F and vice versa. I'd like to make that more flexible during the re-write, but it's a rural area and I'm not really sure they're ready for that yet...
The reason for the data file being in .csv format is the lack of a formal data entry front-end ever being written for the program. That's been perpetually on the To-Do list, but in the mean time and since it's only ran once a year, Excel has been my friend... If all goes well I'll design a data entry screen during the VB.NET re-write.
Thanks in advance to everyone who takes the time to read this.
Now that LINQ is available (since .NET 3.5, so not new) to query lists of objects, using a DataTable
may not be your best option, since your data is coming from a CSV file and not a database (LINQ can also be used for databases).
So, this isn't really an answer to your original question, but if you have input something like this:
Joe User,1,2,3,4,5
Jane User,2,2,3,4,6
Jack User,3,4,5,2,8
Jill User,5,3,1,8,6
You could define a class to store the data:
Public Class UserInfo
Public Property Name As String
Public Property Answers As List(Of Integer) = New List(Of Integer)()
Public Function MatchRating(other As UserInfo) As Integer
Dim rating As Integer = 0
For i = 0 To Me.Answers.Count - 1
If Me.Answers(i) = other.Answers(i) Then
rating += 1
End If
Next
Return rating
End Function
End Class
You could then read the CSV data into a list of UserInfo
objects:
Dim users = File.ReadLines("Data.csv").Select(
Function(line)
Dim parts = line.Split(","c)
Dim user = New UserInfo() With {.Name = parts(0)}
user.Answers.AddRange(parts.Skip(1).Select(Function(str) CInt(str)))
Return user
End Function
).ToList()
You could then find the best matches with something like this, which loops through the users and uses a LINQ query to find the top 5 matches based on number of answers matched (the UserInfo.MatchRating
function), skipping any with no matches ( rating > 0
):
For Each user In users
Console.WriteLine("{0}:{1}", user.Name, String.Join(",", user.Answers))
Dim bestMatches = From u In users
Where u IsNot user
Let rating = u.MatchRating(user)
Where rating > 0
Order By rating Descending
Take 5
Select New With {.Name = u.Name, .Rating = rating}
For Each match In bestMatches
Console.WriteLine(" Match: {0}, rating: {1}", match.Name, match.Rating)
Next
Next
You will need to add properties to the UserInfo
class for your actual identity information and adjust the code to match.
You will also need to make sure that your project options are set appropriately, and you will need the appropriate imports/references, eg:
Option Explicit On
Option Infer On
Option Strict On
Imports System.IO
For reference, the output of my test was (poor Jack, and I suppose you may need to adjust for sexual preference, eg Where u IsNot user AndAlso u.Sex <> user.Sex
):
Joe User:1,2,3,4,5
Match: Jane User, rating: 3
Jane User:2,2,3,4,6
Match: Joe User, rating: 3
Match: Jill User, rating: 1
Jack User:3,4,5,2,8
Jill User:5,3,1,8,6
Match: Jane User, rating: 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.