简体   繁体   中英

Graph or relational database is better for this tree structure?

I'm new with graph databases and need a recommendation for this scenario. I have people who like the categories (only leaves on the tree), the other nodes (parents) do not get "like". Then I calculate a score for all the connections of a specific user to other users. This is a graphic example:

在此处输入图片说明

I try with neo4j and I have no problems (very small data set), but I'm afraid of performance with a lot of users. I've test with following query:

MATCH (n:Person)-[:LIKES*]->()-[r:SUB*0..4]-()<-[:LIKES*]-(m:Person)
WHERE n.name='Gabriel' and n<>m
WITH n.name as user, m.name as connection, 1.0/(length(r)+1)*count(r) as score
RETURN user, connection, sum(score)

I have also thought about doing it in a relational database, save 3 fields for category likes (cat1,cat2,cat3) and then make 3 self joins, looking for matches in the different category levels. Something like that (starting on user 1 and trying to match with others):

select l2.user_id, sum(
case 
    when (l1.cat2 = l2.cat2 and l1.cat3 = l2.cat3) then 1
    when (l1.cat2 = l2.cat2) then 0.25 
    else 0.05 
end)
from likes l1
inner join likes l2 on l1.cat1 = l2.cat1 and l2.user_id <> 1
where l1.user_id = 1 
group by l2.user_id

but I also read that you should avoid doing self joins.

I clarify that I am looking for read performance, writing does not matter. My goal is that it works well with 1 million users with 10 likes each one. I listen to any kind of opinion, thanks!

TLDR; IMO a relational database would be better since your looking at how one thing relates to another, ie how many likes (of teams) a person has. You can easily update metadata on users, teams, or sports without worrying about messing up your analytic queries. Plus you can easily expand to add sport types like college, high school, or the like again without worrying that your previous set-up might get messed up.

BUT, I'll confess that I've never used a graph database before. :)


Doing a relation database might look something like this:

I like to call these info tables since they give just the information, some call them reference tables too, on a specific item: Sport (Sport_ID, Sport_Name, [etc]...) Sport Names like Football, Basketball, etc Example view:

╔══════════╦════════════╦═════╗
║ Sport_ID ║ Sport_Name ║ ... ║
╠══════════╬════════════╬═════╣
║        1 ║ Football   ║ ... ║
║        2 ║ Basketball ║ ... ║
║     ...  ║ ...        ║ ... ║
╚══════════╩════════════╩═════╝

Team (Team_ID, Team_Name, Home_State,[etc]...) -- Teams would be all teams, regardless of what sport type they were in. Example view:

╔═════════╦═════════════╦═════╗
║ Team_ID ║  Team_Name  ║ ... ║
╠═════════╬═════════════╬═════╣
║       1 ║ Boca Junior ║ ... ║
║       2 ║ River Plate ║ ... ║
║       3 ║ Spurs       ║ ... ║
║     ... ║ ...         ║ ... ║
╚═════════╩═════════════╩═════╝ 

User (User_ID, User_First_Name, [etc]...) -- All user ONLY specific information would go here. Example view:

╔═════════╦═════════════════╦═════╗
║ User_ID ║ User_First_Name ║ ... ║
╠═════════╬═════════════════╬═════╣
║       1 ║ Mario           ║ ... ║
║       2 ║ Gabriel         ║ ... ║
║       3 ║ Juana           ║ ... ║
║       4 ║ Raul            ║ ... ║
║     ... ║ ...             ║ ... ║
╚═════════╩═════════════════╩═════╝

Then you'd create the relation tables to make the connections between the sports, teams, and users.

Sports_Team (Sport_ID, Team_ID) -- Here you'd show which team played which sport. Example View:

╔══════════╦═════════╦═════╗
║ Sport_ID ║ Team_ID ║ ... ║
╠══════════╬═════════╬═════╣
║        1 ║       1 ║ ... ║
║        1 ║       2 ║ ... ║
║        2 ║       3 ║ ... ║
║      ... ║     ... ║ ... ║
╚══════════╩═════════╩═════╝

Team_User_Likes (Team_ID, User_ID) -- Here you'd show which person like which teams played which sport. Example View:

╔═════════╦═════════╦═════╗
║ Team_ID ║ User_ID ║ ... ║
╠═════════╬═════════╬═════╣
║       1 ║       1 ║ ... ║
║       2 ║       2 ║ ... ║
║       2 ║       3 ║ ... ║
║       3 ║       3 ║ ... ║
║     ... ║     ... ║ ... ║
╚═════════╩═════════╩═════╝

Now all you have to do to get a score of how many teams a user likes is:

SELECT tul.User_ID
     , COUNT(tul.Team_ID) AS Likes
  FROM team_user_likes tul
 GROUP
    BY tul.User_ID

And if you want the user metadata, like their names, you can throw this query into a CTE and then use the user table to join to the CTE table.

This might look and sound complicated but it will make it easier to edit/update user/team/sport information. You'll be able to do some interesting analytics like how many user like/prefer one sport over another using the like data without having to worry about affecting the relational tables, or which team of each sport is the majority favorite.

Plus this should scale easily, depending on what relational database you use. And say you wanted to start adding high school, college, etc. sports, you could just add a sport_type table then create a sport_sport_type relation table to make the connection of which sports are professional or one of the others. Viola, you can then do analytics by sport type without worry about how it affects your previous set up.

I prefer relations databases because they seem to keep things tidier. That all being said, I have never used a graph database. But considering that your seeing how one thing relates to another, ie how many teams a person likes, my opinion is that you should go with a relational database.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM