Wednesday, 17 February 2016

HDFS In Numbers

Hadoop makes use of the Hadoop File System, or HDFS, a distributed file system. It differs from other distributed file systems in that it is specifically designed for:

1) Large data sets (in the sense that each file is gigabytes or terabytes in size)
2) Implementation on low cost hardware (so cheaply scalable to hundreds or thousands of machines)

Each server machine can store part of the file system's data.

A more detailed introduction on HDFS' design can be found here.

The Hadoop API is detailed here.
 

No comments: