Cheap secondary storage options
Looking at cheap storage for hosting backup/archive… driving force: cost. NetApp is too expensive.
The following can be taken into consideration:
— Amazon S3 web services. Since it’s only used for backup/archive the cost should not be too high. Need to analyze the cost though. Another potential issue is legal/privacy…
Technically the best way is probably to use a S3 file system driver so the backup/access is transparent to the apps and existing apps doesn’ t need to be modified. the following is a list of S3 file system drivers, for example:
- Fuse over Amazon: http://http://code.google.com/p/s3fs/wiki/FuseOverAmazon
Note that the Hadoop project provides two file systems that uses S3: http://wiki.apache.org/hadoop/AmazonS3 . However seems you have to use the hadoop to access the file systems and they are not accessible by normal apps.
— GFS-like distributed file systems, So that we can use cheap/commodity intel hardware to construct the storage cluster. Currently there are two open source GFS like DFS implementations.:
- CloudStore ( formerly Kosmos File System / KFS): http://kosmosfs.sourceforge.net/. quoted form the web site: “Web-scale applications require a scalable storage infrastructure to process vast amounts of data. CloudStore (formerly, Kosmos filesystem) is an open-source high performance distributed filesystem designed to meet such an infrastructure need” It’s written in C++ and can be mounted as a file system via FUSE on linux.
- Hadoop HDFS File System: part of the Hadoop Core project. http://hadoop.apache.org/core/docs/current/hdfs_design.html. Hadoop is developed in Java. There is also some effort to mount HDFS on linux/systems: http://wiki.apache.org/hadoop/MountableHDFS.
