We're a UK Collective, doing our bit to help increase that distribution, by bringing the future that already exists to a wider audience.

Crunch, save time, save tax, save money
London Tech Jobs at Career 2.0

Hadoop is Big, I mean really BIG

Hadoop, for those not up on it, is a top level Apache project. That means it's right up there with the Apache Web Server in terms of importance.

Hadoop allows you to distribute data-intensive applications across many nodes. These nodes can scale across many machines and sift through peta-bytes of data very quickly.

Deepak Singh has collected some massive data stats on extreme Hadoop
uses, by some of the leading Web companies:


  • 36 PB of uncompressed data
  • 2250 machines
  • 23,000 cores
  • 32 GB of RAM per machine
  • processing 80-90TB/day


  • 70 PB of data in HDFS
  • 170 PB spread across the globe
  • 34000 servers
  • Processing 3 PB per day
  • 120 TB flow through Hadoop every day


  • 7 TB/day into HDFS


  • 120 Billion relationships
  • 82 Hadoop jobs daily (IIRC)
  • 16 TB of intermedia data.