From d759884b339dad3faceb6be91ddefc7e85894ea3 Mon Sep 17 00:00:00 2001 From: Siddharth Ravikumar Date: Tue, 22 Mar 2016 09:21:37 -0400 Subject: Whitespace and Cosmetic fixes. --- report/chapters/2-lit-r.tex | 168 ++++++++++++++++++++++---------------------- 1 file changed, 85 insertions(+), 83 deletions(-) (limited to 'report/chapters/2-lit-r.tex') diff --git a/report/chapters/2-lit-r.tex b/report/chapters/2-lit-r.tex index 4e26923..28e24c7 100644 --- a/report/chapters/2-lit-r.tex +++ b/report/chapters/2-lit-r.tex @@ -5,32 +5,33 @@ The idea of unifying the storage provided by multiple Internet file storage providers and storing all the content in an encrypted form is -not new. In the past, computer researchers and programmers have devised different -methods to use multiple file storage providers' storage space. This -chapter gives an overview of the work done by Yeo et al. in unifying -the storage provided by Dropbox, Box, Google Drive and Skydrive on -Android devices \cite{yeo}(Section \ref{2-yeo-sec}); SkyCDS, a content -delivery service, by Gonzalez et al., which uses publish/subscribe -overlay paradigm and stores the content across multiple cloud storage -providers such that only part of the content (in encrypted form) is -stored on each file storage provider \cite{skycds}(Section -\ref{2-skycds-sec}); and, lastly, \verb+git-annex+, by Joey -Hess\cite{person:joeyh}, that allows one to version control and keep -track of large files with a possibility of encrypting files that are -stored in ``special remotes'' -- storage provided by Internet file -storage providers (Section \ref{2-gitannex-sec}). +not new. In the past, computer researchers and programmers have +devised different methods to use multiple file storage providers' +storage space. This chapter gives an overview of the work done by Yeo +et al. in unifying the storage provided by Dropbox, Box, Google Drive +and Skydrive on Android devices \cite{yeo}(Section \ref{2-yeo-sec}); +SkyCDS, a content delivery service, by Gonzalez et al., which uses +publish/subscribe overlay paradigm and stores the content across +multiple cloud storage providers such that only part of the content +(in encrypted form) is stored on each file storage provider +\cite{skycds}(Section \ref{2-skycds-sec}); and, lastly, +\verb+git-annex+, by Joey Hess\cite{person:joeyh}, that allows one to +version control and keep track of large files with a possibility of +encrypting files that are stored in ``special remotes'' -- storage +provided by Internet file storage providers (Section +\ref{2-gitannex-sec}). \section{Multi Cloud Storage Prototype}\label{2-yeo-sec} -In the paper ``Leveraging client-side storage techniques for -enhanced use of multiple consumer cloud storage services on +In the paper ``Leveraging client-side storage techniques for enhanced +use of multiple consumer cloud storage services on resource-constrained mobile devices'', Yeo et al. show their Android mobile application, a prototype, which unifies storage provided by Dropbox, Box, Google Drive and SkyDrive. The application allows the user to store all their information in a single location on their -phone and it uses erasure coding \cite{weatherspoon} to split each file -into \verb`n + k` fragments and spreads the encrypted fragments across -storage provided by the file storage providers. All basic file +phone and it uses erasure coding \cite{weatherspoon} to split each +file into \verb`n + k` fragments and spreads the encrypted fragments +across storage provided by the file storage providers. All basic file operations -- Create, Rename, Update, Delete (CRUD) -- are possible. Information about the files stored in the unified location is stored in a SQLite database. Unlike combox, which depends the file @@ -51,30 +52,30 @@ feature in a ``resource constrained'' device where storage is expensive. Yeo et al. propose methods for achieving data de-duplication; file -compression based on file type; intelligent pre-fetching -and caching of file fragments and ``automatic restoration in -exploiting file-versioning''. These features were not implemented in -the prototype Android application and there is possibility of Yeo et +compression based on file type; intelligent pre-fetching and caching +of file fragments and ``automatic restoration in exploiting +file-versioning''. These features were not implemented in the +prototype Android application and there is possibility of Yeo et al. implementing these features in the future. -It becomes apparent that Yeo et al. work is of immense importance. This is particularly true when -we taking into consideration the research done by Yang et al., which -found that 59\% of the users who use ``cloud storage service'' access -the service through a smart phone and 42.2\% users access it for -audio/video \cite{yang}. The research by Yang et al. -suggests a trend of users' preference for small hand-held computers -over laptops and desktops. +It becomes apparent that Yeo et al. work is of immense +importance. This is particularly true when we taking into +consideration the research done by Yang et al., which found that 59\% +of the users who use ``cloud storage service'' access the service +through a smart phone and 42.2\% users access it for audio/video +\cite{yang}. The research by Yang et al. suggests a trend of users' +preference for small hand-held computers over laptops and desktops. \section{SkyCDS}\label{2-skycds-sec} SkyCDS, by Gonzalez et al., is a content delivery system that splits -and spreads the content across multiple file storage -providers \cite{skycds}. According to Gonzalez et al., the main reason -for designing and developing SkyCDS was to prevent content providers -from getting locked into just one file storage provider and to -minimize loss when a file storage provider goes out of business or if -there is temporary outage in the storage service provided by the file -storage provider. +and spreads the content across multiple file storage providers +\cite{skycds}. According to Gonzalez et al., the main reason for +designing and developing SkyCDS was to prevent content providers from +getting locked into just one file storage provider and to minimize +loss when a file storage provider goes out of business or if there is +temporary outage in the storage service provided by the file storage +provider. In SkyCDS, the content delivery to subscribers of the content is segregated into two distinct layers -- Metadata Flow Layer and the @@ -91,11 +92,12 @@ responsible for publishing the content using the ``delivery workflow'' When content has to be dispersed to $k$ file storage providers, the content is split into $n$ chunks, $n > k$. This file splitting seems to produce 66.7\% of redundancy overhead \cite{skycds}. This file -splitting scheme also looks very similar to erasure coding, but Gonzalez et -al. don't explicitly state that the content splitting scheme is indeed -``erasure coding''. The splitting of content is done by the ``delivery -workflow'' engine which is invoked when the publisher triggers the -action to publish the respective content to subscribers. +splitting scheme also looks very similar to erasure coding, but +Gonzalez et al. don't explicitly state that the content splitting +scheme is indeed ``erasure coding''. The splitting of content is done +by the ``delivery workflow'' engine which is invoked when the +publisher triggers the action to publish the respective content to +subscribers. To evaluate the effectiveness of SkyCDS, Gonzalez et al. state that they've done a case study using the data obtained from the European @@ -110,9 +112,9 @@ space and reliability. \verb+git-annex+ allows one to version controlled large files that are not usually feasible to version control under -\verb+git+\cite{program:git}. \verb+git-annex+ checks in the name -and other meta-data about the files in git and stores the actual -content under \verb+.git/annex+ directory. When a file is added to +\verb+git+\cite{program:git}. \verb+git-annex+ checks in the name and +other meta-data about the files in git and stores the actual content +under \verb+.git/annex+ directory. When a file is added to \verb+git-annex+, a symlink of the file is created in place of the file and the content of the file itself is stored under the \verb+.git/annex+ directory. @@ -148,7 +150,7 @@ add deb-nicholson-80s.medium.webm ok ↳ ls -l ... -lrwxrwxrwx 1 rsd rsd 207 May 5 2015 deb-nicholson-80s.medium.webm +lrwxrwxrwx 1 rsd rsd 207 May 5 2015 deb-nicholson-80s.medium.webm -> ../.git/annex/objects/3j/vG/SHA256E-s108196923--7de9484ee96908268e 21b451eb9805552c32b44da08e70ee861332c87352944f.webm/SHA256E-s10819692 3--7de9484ee96908268e21b451eb9805552c32b44da08e70ee861332c87352944f.w @@ -162,12 +164,12 @@ ebm } Now, the file \verb+deb-nicholson-80s.medium.webm+ is checked into -\verb+git-annex+ and the command \verb+git annex sync+ can be issued to sync the -repository to other \verb+git-annex+ repositories. It must be noted -here that when the repository is synced, the file content itself is -not transferred to the other \verb+git-annex+ repositories; only the -file's name and its meta-data that is stored in a separate git branch -called \verb+git-annex+ are +\verb+git-annex+ and the command \verb+git annex sync+ can be issued +to sync the repository to other \verb+git-annex+ repositories. It must +be noted here that when the repository is synced, the file content +itself is not transferred to the other \verb+git-annex+ repositories; +only the file's name and its meta-data that is stored in a separate +git branch called \verb+git-annex+ are transferred\cite{documentation:git-annex-hworks}. In order to create a copy of a given file in another git annex repository, \verb+git annex get /path/to/filename.ext+ has to done. @@ -180,36 +182,36 @@ storage providers. At the time of writing this report, services: {\scriptsize -\begin{itemize} -\item Amazon S3 -\item Amazon Glacier -\item Internet Archive via S3 -\item Box.com -\item Google drive -\item Google Cloud Storage -\item Mega.co.nz -\item SkyDrive -\item OwnCloud -\item Flickr -\item IMAP -\item Usenet -\item chef-vault -\item hubiC -\item pCloud -\item ipfs -\item Ceph -\item Blackblaze's B2 -\end{itemize} + \begin{itemize} + \item Amazon S3 + \item Amazon Glacier + \item Internet Archive via S3 + \item Box.com + \item Google drive + \item Google Cloud Storage + \item Mega.co.nz + \item SkyDrive + \item OwnCloud + \item Flickr + \item IMAP + \item Usenet + \item chef-vault + \item hubiC + \item pCloud + \item ipfs + \item Ceph + \item Blackblaze's B2 + \end{itemize} } All data pushed to file storage provider's servers can optionally be encrypted using one's GPG key. For instance, to encrypt data that is -pushed to the Amazon S3 special remote, the following command is -used \cite{docs:git-annex-as3}: +pushed to the Amazon S3 special remote, the following command is used +\cite{docs:git-annex-as3}: \begin{verbatim} $ git annex initremote cloud type=S3 keyid=2512E3C7 -initremote cloud (encryption setup with gpg key C910D9222512E3C7) +initremote cloud (encryption setup with gpg key C910D9222512E3C7) (checking bucket) (creating bucket in US) (gpg) ok $ git annex describe cloud "at Amazon's US datacenter" describe cloud ok @@ -222,16 +224,16 @@ size \verb+N+, to do that we do: \begin{verbatim} $ git annex initremote cloud type=S3 chunk=1MiB keyid=2512E3C7 -initremote cloud (encryption setup with gpg key C910D9222512E3C7) +initremote cloud (encryption setup with gpg key C910D9222512E3C7) (checking bucket) (creating bucket in US) (gpg) ok $ git annex describe cloud "at Amazon's US datacenter" describe cloud ok \end{verbatim} -Upon completion, each file that has to be pushed to the Amazon S3 special -remote is divided into 1MiB chunks, each chunk is encrypted using the -GPG key \verb+2512E3C7+ and the encrypted chunks are finally pushed to -the Amazon S3 remote. It must be noted here that unlike the Multi -Cloud Storage Prototype or SkyCDS or combox, in \verb+git-annex+ when -we are using file chunking all the chunks go to the same location -- -in this case, the Amazon S3 remote. +Upon completion, each file that has to be pushed to the Amazon S3 +special remote is divided into 1MiB chunks, each chunk is encrypted +using the GPG key \verb+2512E3C7+ and the encrypted chunks are finally +pushed to the Amazon S3 remote. It must be noted here that unlike the +Multi Cloud Storage Prototype or SkyCDS or combox, in \verb+git-annex+ +when we are using file chunking all the chunks go to the same location +-- in this case, the Amazon S3 remote. -- cgit v1.2.3