summaryrefslogtreecommitdiffstats
path: root/report/chapters/3-lit-r.tex
diff options
context:
space:
mode:
authorSiddharth Ravikumar <sravik@bgsu.edu>2016-02-26 08:36:26 -0500
committerSiddharth Ravikumar <sravik@bgsu.edu>2016-02-26 08:36:26 -0500
commit2c136728999d7451d8eef2f202a08ec7bc524136 (patch)
tree25663b1b028dd008517773183bdb6d9cce026216 /report/chapters/3-lit-r.tex
parentf20eb79289341ed649345a30aacd7cd07ba2e135 (diff)
Moved around chapters.
Chapter 3 -> Chapter 2 Chapter 4 -> Chapter 3 Chapter 5 -> Chapter 4
Diffstat (limited to 'report/chapters/3-lit-r.tex')
-rw-r--r--report/chapters/3-lit-r.tex234
1 files changed, 0 insertions, 234 deletions
diff --git a/report/chapters/3-lit-r.tex b/report/chapters/3-lit-r.tex
deleted file mode 100644
index be8b99c..0000000
--- a/report/chapters/3-lit-r.tex
+++ /dev/null
@@ -1,234 +0,0 @@
-\chapter{Background and Literature Review}
-
-\epigraph{Books serve to show a man that those original thoughts of
- his aren't very new after all}{\textit{Abraham Lincoln}}
-
-The idea of unifying the storage provided by multiple Internet file
-storage providers and storing all the content in an encrypted form is
-not new, computer researchers/scientists, programmers have devised
-different methods to use multiple file storage providers' storage
-space. This chapter gives an overview of the work done by Yeo et
-al. in unifying the storage provided by Dropbox, Box, Google Drive and
-Skydrive on Android devices\cite{yeo}(Section \ref{3-yeo-sec});
-SkyCDS, a content delivery service, by Gonzalez et al., which uses
-publish/subscribe overly paradigm and stores the content across
-multiple ``cloud'' storage providers such that only part of the
-content (in encrypted form) is stored on each ``cloud'' storage
-provider\cite{skycds}(Section \ref{3-skycds-sec}); lastly,
-\verb+git-annex+, by Joey Hess\cite{person:joeyh}, that allows one to
-version control and keep track of large files with a possibility of
-encrypting files that are stored in ``special remotes'' -- storage
-provided by Internet file storage providers (Section
-\ref{3-gitannex-sec}).
-
-\section{Multi Cloud Storage Prototype}\label{3-yeo-sec}
-
-In their paper ``Leveraging client-side storage techniques for
-enhanced use of multiple consumer cloud storage services on
-resource-constrained mobile devices'', Yeo et al. show their Android
-mobile application, a prototype, which unifies storage provided by
-Dropbox, Box, Google Drive and SkyDrive. The application allows the
-user to store all their information in a single location on their
-phone and the application uses erasure coding\cite{weatherspoon} to
-split each file into \verb`n + k` fragments and spreads the encrypted
-fragments across storage provided by the file storage providers. All
-basic file operations -- Create, Rename, Update, Delete (CRUD) -- are
-possible. Information about the file stored in a unified location is
-stored in a SQLite database. Unlike combox, which depends the file
-storage provider' client to sync file fragments/shards to the file
-storage provider's server, the android application developed by Yeo et
-al. takes the responsibility to sync file fragments/shards to each
-file storage provider and usesd the OAuth 2.0\cite{protocal:oauth2}
-protocol for authorization.
-
-For encrypting file fragments, they use AES-256; they key for
-encrypting is derived from the user's password by using Password-Based
-Key Derivation Function (PBKDF2)\cite{kaliski}. For erasure coding
-they use the JigDFS librarary\cite{jigdfs}. The android application is
-able do ``progressive streaming'' of media files; this means that
-large media files can be streamed in real-time from the from the file
-storage providers' servers; this is an attractive feature in a
-``resource contrained'' device where storage is expensive.
-
-Yeo et al. propose methods for achieving data de-duplication, file
-fragment/shard compression based on the type of the file, intelligent
-pre-fetching and caching for file fragrments and ``automatic
-restoration in exploiting file-versioning''; these features were not
-implemented in the prototype Android application and there is
-possibility of Yeo et al. implementing these features in the future.
-
-It becomes that that Yeo et al. work is of immense importance when we
-take into consideration the research done by Yang et al., which found
-that 59\% of the users who use ``cloud storage service'' access the
-service through a smart phone and 42.2\% users access
-audio/video\cite{yang}. The research by Yang et al. definitely
-suggests a trend of users' preference for small hand-held computers
-over laptops and desktops.
-
-\section{SkyCDS}\label{3-skycds-sec}
-
-SkyCDS, by Gonzalez et al., is a content delivery system that splits
-and spreads the content across multiple ``cloud'' storage
-providers\cite{skycds}. According to Gonzalez et al., the main reason
-for designing and developing SkyCDS was to prevent content providers
-from getting locked into just one ``cloud'' storage provider and to
-minimize loss when a ``cloud'' storage provider goes out of business
-or if there is temporary outage in the storage service provided by the
-``cloud'' storage provider.
-
-In SkyCDS the content delivery to subscribers of the content is
-segregated into two distinct layers -- Metadata Flow Layer and the
-Content Flow Layer. The publisher of the content largely interacts
-with the Metadata Flow Layer that controls and keeps track of the what
-content is published and the subscriber also largely interacts with
-the Metadata Flow layer to subscribe to content published in the
-content delivery system. The Content Flow Layer is where the content
-is stored across multiple ``cloud'' storage providers. The publisher
-is responsible for publishing the content using eth ``delivery
-workflow'' (part of the Content Flow Layer) and the subscriber uses
-the ``retrieve workflow'' to get access to the subscribed content.
-
-When content has to be dispersed to $k$ ``cloud'' storage providers,
-the content is split into $n$ chunks, $n > k$, this file splitting
-seems to produce 66.7\% of redundancy overhead\cite{skycds}; this file
-splitting scheme looks very similar to erasure coding, but Gonzalez et
-al. don't explicitly state that the content splitting scheme is indeed
-``erasure coding''. The splitting of content is done by the ``delivery
-workflow'' engine which is invoked when the publisher triggers the
-action to publish the respective content to subscribers.
-
-To evaluate the effectiveness of SkyCDS, Gonzalez et al. state that
-they've done a case study using the data (content) obtained from
-European Space Astronomy Center (ESAC) for the Soil Moisture Ocean
-Salinity. In this study, a group of organizations, in two different
-continents, used SkyCDS to share satillete images with each
-other. According to Gonzalez et al. this study attested SkyCDS as a
-viable option for content delivery with respective to performance,
-cost of ``cloud'' storage space and reliability.
-
-\section{git-annex}\label{3-gitannex-sec}
-
-\verb+git-annex+ allows one to version controlled large files that are
-not usually feasible to version control under
-\verb+git+\cite{program:git}. \verb+git-annex+, checks in the names
-and other meta-data about the files in git and stores the actual
-content under \verb+.git/annex+ directory. When a file is added to
-\verb+git-annex+, a symlink of the file is created in place of th file
-and the content of the file itself is stored under the
-\verb+.git/annex+ directory.
-
-For instance, say there is a file called
-\verb+deb-nicholson-80s.medium.webm+ was downloaded from the Internet
-to the \verb+git-annex+ directory:
-
-\begin{verbatim}
-↳ git status
-On branch master
-Untracked files:
- (use "git add <file>..." to include in what will be committed)
-
- deb-nicholson-80s.medium.webm
-
-↳ ls -l
-total 105708
-...
--rw-r--r-- 1 rsd rsd 108196923 May 5 2015 deb-nicholson-80s.medium.webm
-...
-\end{verbatim}
-
-When this file is added to \verb+git-annex+ with \verb+git annex add+,
-the file turns into a symlink to a file under the \verb+.git/annex+
-directory:
-
-{\small
-\begin{verbatim}
-↳ git annex add deb-nicholson-80s.medium.webm
-add deb-nicholson-80s.medium.webm ok
-(recording state in git...)
-
-↳ ls -l
-...
-lrwxrwxrwx 1 rsd rsd 207 May 5 2015 deb-nicholson-80s.medium.webm -> ../.git/an
-nex/objects/3j/vG/SHA256E-s108196923--7de9484ee96908268e21b451eb9805552c32b44da08e7
-0ee861332c87352944f.webm/SHA256E-s108196923--7de9484ee96908268e21b451eb9805552c32b4
-4da08e70ee861332c87352944f.webm
-
-↳ git commit -m "Added video/deb-nicholson-80s.medium.webm"
-[master efa1775] Added video/deb-nicholson-80s.medium.webm
- 1 file changed, 1 insertion(+)
- create mode 120000 video/deb-nicholson-80s.medium.webm
-\end{verbatim}
-}
-
-Now, the file \verb+deb-nicholson-80s.medium.webm+ is checked into
-\verb+git-annex+ and we can now do a \verb+git annex sync+ to sync the
-repository to other \verb+git-annex+ repositories. It must be noted
-here that that when the repository is synced, the file content itself
-is not transferred to the other \verb+git-annex+ repositories; only
-the file's name and its meta-data that is stored in a separate git
-branch called \verb+git-annex+ are
-transferred\cite{documentation:git-annex-hworks}. In order to create a
-copy of a given file in another git annex repository,
-\verb+git annex get /path/to/filename.ext+ has to done.
-
-\verb+git-annex+ has this feature called ``special
-remotes''\cite{documentation:git-annex-sremotes}, that allows one to
-push/copy data to checked into \verb+git-annex+ to storage provided by
-``cloud'' storage providers. At the time of writing this report,
-\verb+git-annex+ supports pushing data to the following file storage
-services:
-
-{\scriptsize
-\begin{itemize}
-\item Amazon S3
-\item Amazon Glacier
-\item Internet Archive via S3
-\item Box.com
-\item Google drive
-\item Google Cloud Storage
-\item Mega.co.nz
-\item SkyDrive
-\item OwnCloud
-\item Flickr
-\item IMAP
-\item Usenet
-\item chef-vault
-\item hubiC
-\item pCloud
-\item ipfs
-\item Ceph
-\item Blackblaze's B2
-\end{itemize}
-}
-
-All data pushed to file storage provider's servers can be optionally
-encrypted using one's GPG key. For instance, to encrypt data that is
-pushed to the Amazon S3 special remote, following command is
-used\cite{docs:git-annex-as3}:
-
-\begin{verbatim}
-$ git annex initremote cloud type=S3 keyid=2512E3C7
-initremote cloud (encryption setup with gpg key C910D9222512E3C7) (checking bucket) (creating bucket in US) (gpg) ok
-$ git annex describe cloud "at Amazon's US datacenter"
-describe cloud ok
-\end{verbatim}
-
-where \verb+2512E3C7+ is the id of the GPG key to use for encrypting
-data pushed to the Amazon S3 special remote. It is also possible to
-store each file that is pushed to the remotes as a set of chunks of
-size \verb+N+, to do that we do:
-
-\begin{verbatim}
-$ git annex initremote cloud type=S3 chunk=1MiB keyid=2512E3C7
-initremote cloud (encryption setup with gpg key C910D9222512E3C7) (checking bucket) (creating bucket in US) (gpg) ok
-$ git annex describe cloud "at Amazon's US datacenter"
-describe cloud ok
-\end{verbatim}
-
-with that each file that has to be pushed to the Amazon S3 special
-remote is divided into 1MiB chunks, each chunk is encrypted using the
-GPG key \verb+2512E3C7+ and the encrypted chunks are finally pushed to
-the Amazon S3 remote. It is must be noted here that unlike the Multi
-Cloud Storage Prototype or SkyCDS or combox, in \verb+git-annex+ when
-we are using file chunking all the chunks go to the same location --
-in this case, the Amazon S3 remote.