NoSQL at Twitter: Why / How they use Scribe, Hadoop/Pig, HBase, Cassandra, and FlockDB for data analytics?


Here’s some interesting NoSQL stuff guys. It’s a presentation about how Twitter uses NoSQL for analytics by Kevin Weil (@kevinweil), Analytics Lead, Twitter.

About the presentation

Collecting data (Scribe)
Storing and Analyzing data (Hadoop)
Rapid Learning over Big Data (Pig)

.. and Cassandra, HBase, FlockDB.

For those who are not aware of the said technologies…

Scribe – Log collection framework over Thrift, built and open sourced by Facebook
Hadoop – A software framework that supports data-intensive distributed applications from Apache
Pig – A platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.
Cassandra – An open source distributed database management system
HBase – An open source, non-relational, distributed database modeled after Google’s BigTable and is written in Java
FlockDB – An open source distributed, fault-tolerant graph database for managing data at webscale

About these ads

2 comments

  1. Over time Twitter has been very open about the technologies they are using and they also open source some of the tools they have created. If you check this timeline http://nosql.mypopescu.com/tagged/twitter you’ll notice not only what Twitter has been using over time, but also how things evolved. In 2010, I invited Twitter’s Ryan King to give a presentation about the various NoSQL projects they’ve been using or experimenting with–you can watch it http://www.infoq.com/presentations/NoSQL-at-Twitter-by-Ryan-King

    hope you’ll enjoy these

  2. Most of Social networks (like Facebook) has stopped using mysql as main database and switched to use Cassandra or other no-sql DB. And we can consider this change as big grow for this new open-source data store, Cassandra, which was developed originally by Facebook to solve the problem of inbox search and to be fast, reliable and had the ability to handle read and write requests at the same time
    source: Why does large Social Network projects switch to use Cassandra instead of Mysql?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 27 other followers

%d bloggers like this: