简体   繁体   中英

Find out join-able datatypes between two tables

I have two Views. V1 and V2. I have the columnNames and their respective Datatypes. Is there a way i could find out which columns(Datatypes) can be a join condition between V1 and V2.

Example:

V1 ->

ID: Integer

Name: varchar

DOB: Date

V2->

ID: BIGINT

Salary: REAL

Sex: BOOLEAN

So if i want to perform a join i need to return to the user:

V1 -> ID(Integer) can be joined with V2 -> ID,Salary. (Sex cannot be there since no join can be performed with a boolean datatype)

Similarly V1 -> Name(varchar) can be joined with (ID, Salary)

So at the end i need a JSON : { ID : ID,Salary } {Name : ID,Salary}

Is there some way i could determine whether two datatypes are join able or not?

Thanks.

Ok, so you actually don't have a data model that defines how the two tables go together. This sound like one of those "user side reporting tools" where one is supposed to be able to arbitrarily join data in table form.

That's all nice and good, but it means that there are no hard and fast rules on how to come up with potential join criteria. In a situation like this, you will need rules of thumb, aka 'heuristics'.

Such heuristics have been implemented in many tools, are usually not following any "standard" but mostly (at least, what I've seen) try to go with common sense. One of such rules surely is:

"When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck" quote reference

What I mean is: you try to match data types that go together , at least on a domain level. So, time-date columns can go together with other time-date columns, money columns could go together with other money columns, address column-sets go together with other address column-sets.

This kind of matching doesn't help with finding connections that make any sense, but they could help with weeding out those that can absolutely not make any sense at all. Unfortunately, this approach is quite a bit more involved than just looking at the technical data type that had been used to store the data. For example, a date can easily be stored in any kind of data type. Dates stored in text columns are pretty common - so you would need to try and figure out when this is the case.

Hints for that could be the column name ('date', 'dt', 'day', etc. in it?) or the actual contents (format matches 'YYYY-MM-DD' or 'DDMMYY' or '....').

Similar hints could exist for other column types and successfully predicting the right data domain is one of the main tasks of data exploration tools. As data most often is messy, this is not a simple task.

Coming back to your original question: there's no HANA feature that does this for you and there's little between only looking at technical data types (probably too simple/stupid to cover anything beyond naive test cases) and an extensive set of heuristics for guessing the right semantic domain for a column entry.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM