克理斯在 Baning~

2008-10-06 01:01:20克理斯在 Internet!

Hadoop 在 Facebook ..

留言 0 收藏 0 推薦 0

上圖就是年僅24歲 (May 14, 1984) Facebook的創辦人「Mark Zuckerberg」

他採用了 Apache 專案中的 Hadoop 來作為後端整個分散式平行處理的Framework~

原文來自：

Engineering @ Facebook 的文章 http://www.facebook.com/note.php?note_id=16121578919

Hadoop

作者：Joydeep Sen Sarma 筆記 2008年6月4日 22:33

With tens of millions of users and more than a billion page views every day, Facebook ends up accumulating massive amounts of data. One of the challenges that we have faced since the early days is developing a scalable way of storing and processing all these bytes since using this historical data is a very big part of how we can improve the user experience on Facebook. This can only be done by empowering our engineers and analysts with easy to use tools to mine and manipulate large data sets.

About a year back we began playing around with an open source project called Hadoop. Hadoop provides a framework for large scale parallel processing using a distributed file system and the map-reduce programming paradigm. Our hesitant first steps of importing some interesting data sets into a relatively small Hadoop cluster were quickly rewarded as developers latched on to the map-reduce programming model and started doing interesting projects that were previously impossible due to their massive computational requirements. Some of these early projects have matured into publicly released features (like the Facebook Lexicon) or are being used in the background to improve user experience on Facebook (by improving the relevance of search results, for example).

We have come a long way from those initial days. Facebook has multiple Hadoop clusters deployed now - with the biggest having about 2500 cpu cores and 1 PetaByte of disk space. We are loading over 250 gigabytes of compressed data (over 2 terabytes uncompressed) into the Hadoop file system every day and have hundreds of jobs running each day against these data sets. The list of projects that are using this infrastructure has proliferated - from those generating mundane statistics about site usage, to others being used to fight spam and determine application quality. An amazingly large fraction of our engineers have run Hadoop jobs at some point (which is also a great testament to the quality of technical talent here at Facebook).

The rapid adoption of Hadoop at Facebook has been aided by a couple of key decisions. First, developers are free to write map-reduce programs in the language of their choice. Second, we have embraced SQL as a familiar paradigm to address and operate on large data sets. Most data stored in Hadoop's file system is published as Tables. Developers can explore the schemas and data of these tables much like they would do with a good old database. When they want to operate on these data sets, they can use a small subset of SQL to specify the required dataset. Operations on datasets can be written as map and reduce scripts or using standard query operators (like joins and group-bys) or as a mix of the two. Over time, we have added classic data warehouse features like partitioning, sampling and indexing to this environment. This in-house data warehousing layer over Hadoop is called Hive and we are looking forward to releasing an open source version of this project in the near future.

At Facebook, it is incredibly important that we use the information generated by and from our users to make decisions about improvements to the product. Hadoop has enabled us to make better use of the data at our disposal. So we'd like to take this opportunity to say, "Thank you" to all the people who have contributed to this awesome open-source project.

Joydeep is a Facebook Engineer

上一篇：Google 儲存大量的資料上的取用及安全性

下一篇：在 Windows 平台上架構 Hadoop 開發環境

克理斯在 Baning~

Hadoop 在 Facebook ..

你可能感興趣的文章

欠TW$13,199,999,999,997,689-作嘔22次=TW$13,199,999,999,997,667

非馬科技經歷及論文列表

立法院三讀通過憲法訴訟法修正案宣告違憲時大法官不得少於9人

撕毀護照

&

克理斯在 Baning~

Hadoop 在 Facebook ..

你可能感興趣的文章

欠TW$13,199,999,999,997,689-作嘔22次=TW$13,199,999,999,997,667

非馬科技經歷及論文列表

立法院三讀通過憲法訴訟法修正案 宣告違憲時大法官不得少於9人

撕毀護照

&

立法院三讀通過憲法訴訟法修正案宣告違憲時大法官不得少於9人