简体   繁体   中英

Constructing a personalized Facebook-like Newsfeed: SQL, MongoDB?

I'm building a Facebook-like newsfeed. Meaning it's being built from many SQL tables and each data-type has a specific layout. But's it's becoming very heavy to load and I was hoping to make it even more complex...

Here's what I do now:

User model:

  def updates(more_options = {})
        (games_around({},more_options) + friends_statuses({},more_options).sort! { |a,b| b.updated_at <=> a.updated_at }.slice(0,35) + friends_stats({:limit  => 10},more_options) + friends_badges({:limit  => 3},more_options)).sort! { |a,b| b.updated_at <=> a.updated_at }
  end

Example for the Badges data:

  def friends_badges(options = {:limit  => 3}, more_options = {})
    rewards = []
      rewards = Reward.find(:all, options.merge!(:conditions  => ["rewards.user_id IN (?)",self.players_around({},more_options).collect{|p| p.id}], :joins  => [:user, :badge], :order  => "rewards.created_at DESC"))            
    rewards.flatten
  end

Newsfeed View:

<% for update in @current_user.updates %>
        <% if update.class.name == "Status" %>
            <% @status = update %>
            <%= render :partial  => "users/statuses/status_line", :locals  => {:status  => update} %>
        <% elsif update.class.name == "Game" %>
            <%= render :partial => "games/game_newsfeed_line", :locals  => {:game  => update} %>
        <% elsif update.class.name == "Stat" %>
            <%= render :partial => "stats/stat_newsfeed_line", :locals  => {:stat  => update} %>
        <% elsif update.class.name == "Reward" %>
            <%= render :partial => "badges/badge_newsfeed_line", :locals  => {:reward  => update} %>
        <% end %>
    <% end %>

The options I thought about:

  • Building a "Feed" table and preprocess most of the updates for each user with a background job. Most likely an hourly cron. I would store the entire HTML code for each update.
  • Keep the initial structure but work on caching each update separately (right now I have no caching)
  • Switch to MongoDB to get a faster access to the database

I have to say, I'm not really an expert, Rails made the first steps easy but now with more than 150 SQL requests per page loaded I feel it's out of control and requires an expert point of view...

What would you do?

Thanks for your precious help,

截图

Your code doesn't tell me a lot; I think it'd be helpful if you could lay out your data structure in plain JSON / SQL.

Anyway, I'd serialize each user's stream to MongoDB. I wouldn't store the HTML in the database for various reasons (at least not at that level of the software); instead, you should save the relevant data in a (possibly polymorphic) collection. Fetching the newsfeed is very easy then, indexation is straightforward, etc. The view structure would essentially not change. If you later want to change the HTML, that is easy as well.

The downside is that this will duplicate a lot of data. If people can have lots of followers, this may become a problem. Using arrays of user ids instead of a single user id might help (if the information is the same for all followers), but it's also limited.

For very large association problems, there is only caching. The way I understand it, the magic in both facebook and twitter is that they don't hit the db very often and keep a lot of data in RAM. If you're associating billions of items, doing that is a challenge even in RAM.

The updates should be written continously rather than on an hourly basis. Suppose you have a lot of traffic, and the hourly update takes 30min. Now, the worst case is a 90 min. delay. If you process changes just-in-time, you can cut this to probably 5 min.

You'll have to throw in assumptions at some point, use caching and some heuristics. Some examples:

  • The more recent a tweet, the more traffic it will see. It has a higher chance of being retweeted, and it's seen much more often. Keep it in RAM.
  • Your facebook timeline overview page of 1991 is probably not going to change on a daily basis, so this is a candidate for long-term output caching.
  • Current facebook activity is likely to undergo lot's of writes. Output caching won't help much here. Again, the object should be kept in RAM.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM