summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--report/chapters/1-intr.tex47
-rw-r--r--report/chapters/2-lit-r.tex55
-rw-r--r--report/chapters/3-arch-d.tex98
-rw-r--r--report/chapters/4-testing.tex62
-rw-r--r--report/chapters/5-con-f.tex33
5 files changed, 148 insertions, 147 deletions
diff --git a/report/chapters/1-intr.tex b/report/chapters/1-intr.tex
index 94c805b..7ebcdec 100644
--- a/report/chapters/1-intr.tex
+++ b/report/chapters/1-intr.tex
@@ -8,7 +8,7 @@ data/information on their servers and at the same time there is a lot
of evidence of governments and other powerful organizations being able
to access information/data stored on the Internet companies'
computers\cite{website:wikileaks-spyfiles}. Also, most companies add a
-standard clause in their privacy policy that allow them to disclose
+standard clause in their privacy policy that allows them to disclose
information about users or information stored/created by users to
``third parties'':
@@ -21,7 +21,7 @@ information about users or information stored/created by users to
Policy\cite{website:dropbox-privacy}
\end{quote}
-In this type of world, it did be good to have a program that would
+In this type of world, it would be good to have a program that would
encrypt all the data/information before storing it on the storage
provided by Internet companies. combox aims to be one such program
which not only encrypts but stores only a part of the encrypted
@@ -35,14 +35,14 @@ combox.
\section{What is combox?}\label{1-sec-cb}
-combox allows the user to store all their files in the ``combox
+combox allows the user to store all of their files in the ``combox
directory'' and combox picks each file stored in the combox directory,
-splits them into N shards, encrypts each of the N shards and spreads
-the shards to N node directories. A ``node directory'' is the
+splits them into $N$ shards, encrypts each of the $N$ shards and spreads
+the shards to $N$ node directories. A ``node directory'' is the
directory of the file storage provider (Dropbox directory is a node
directory). Figure \ref{fig:1-combox-overview-0}, illustrates how a file
called \verb+strunk-white.pdf+ is split, encrypted and spread across
-N node directories; shards \verb+strunk-white.pdf.shard0+ to
+$N$ node directories; shards \verb+strunk-white.pdf.shard0+ to
\verb+strunk-white.pdf.shardN+ are encrypted.
\begin{figure}[h]
@@ -78,17 +78,17 @@ N node directories; shards \verb+strunk-white.pdf.shard0+ to
\end{figure}
combox does not sync encrypted shards stored in the node directories
-to the respective file storage providers' data store and it depends on
+to the respective file storage providers' data store. Instead, it depends on
the respective file storage provider's client program to sync the
shards.
combox can be used on all of the user's computers. For instance, the
user can install combox on their second computer and combox will
reconstruct the file from the encrypted shards stored in the node
-directories into the combox directory on their second computer; figure
+directories into the combox directory on their second computer; Fig.
\ref{fig:1-combox-overview-1} illustrates this. Here too, combox
depends on the client program of the respective file storage provider
-to sync shards to/from the file storage provider's data store to/from
+to sync shards to/from the file storage provider's data store and to/from
the respective node directory on the user's computer.
\begin{figure}[h]
@@ -131,7 +131,7 @@ and Dropbox.
\section{How is combox different from Combo-Box?}\label{1-sec-cb-diff}
-Combo-Box by Wesley Vollmar\cite{vollmar-combo-box} was the first
+Combo-Box by Wesley Vollmar \cite{vollmar-combo-box} was the first
implementation of the idea of storing encrypted shards of a file on
storage provided different file storage providers and depending on the
file storage provider's client to sync shards to their respective data
@@ -143,23 +143,23 @@ enumerated below:
runs on GNU/Linux and OS X and is not compatible with Microsoft
Windows as of version 0.2.3.
\item[File splitting] Combo-Box splits a file into shards based on the
- space available on each node directory\cite{vollmar-combo-box},
+ space available on each node directory \cite{vollmar-combo-box},
while combox is not yet cognizant about space left on each node
- directory and splits the file into N equal shards, where N is equal
+ directory and splits the file into $N$ equal shards, where $N$ is equal
to the number of node directories.
\item[User Interface] Combo-Box is a graphical application while
- combox is mostly a command-line program; combox's configuration
+ combox is mostly a command-line program. combox's configuration
wizard has a graphical interface. The configuration wizard has a
command-line interface too for users who like TUI.
\item[Database] Combo-Box uses a traditional SQL database with two
tables to keep track of files' shards, files' hash, files' last
``sync time'' and for ``security and stability'' uses stored
procedures that retrieve/store information in the
- database\cite{vollmar-combo-box}.
+ database \cite{vollmar-combo-box}.
combox on the other hand uses a key-value data store to track the
files stored in the combox directory using the pickleDB
- library\cite{pylib:pickledb}. The key-value data store is a JSON
+ library \cite{pylib:pickledb}. The key-value data store is a JSON
file and all access to this data store is done through an instance
of \verb+combox.silo.ComboxSilo+
class\footnote{https://git.ricketyspace.net/combox/tree/combox/silo.py?id=fb7fdd218\#n29}
@@ -171,7 +171,7 @@ enumerated below:
create/moved/modified/deleted on another computer.
\item[Installation] Combo-Box uses the proprietary
- InstallShield\cite{nonfree-installshield} to install the program,
+ InstallShield \cite{nonfree-installshield} to install the program,
setup shortcuts and registry settings\cite{vollmar-combo-box}.
combox is a python package, it can either be installed through
@@ -186,8 +186,8 @@ enumerated below:
must be in the same locations on all the computers.
combox stores its configuration at
- \verb+$HOME/.combox/config.yaml+; the configuration file is not
- shared on computers on which the user runs combox; this makes it
+ \verb+$HOME/.combox/config.yaml+. The configuration file is not
+ shared on computers on which the user runs combox. This makes it
possible to keep the combox directory and the directories of the
file storage providers' (node directories) in different locations on
each computer. The configuration file is a YAML file and can be
@@ -203,16 +203,15 @@ Installing and running combox is relatively easy for Unix users:
$ combox
\end{verbatim}
-For detailed information on installing combox, see
+For detailed information on installing combox, see \\
https://ricketyspace.net/combox/setup/.
\subsection{Caveats}
-combox is extremely event-driven and depends on file-system events to
-do the right thing when a file is created/modified/moved/deleted, so
+combox is extremely event-driven and depends on filesystem events to
+do the correct action when a file is created/modified/moved/deleted, so
the user must make sure to start combox before starting the file
storage providers' client programs that sync encrypted shards to the
-respective node directories; on GNU/Linux distributions this can be
+respective node directories. On GNU/Linux distributions this can be
automated through the distribution's start-up system (most GNU/Linux
-distributions seem to use \verb+systemd+\cite{website:systemd} these
-days).
+distributions seem to use \verb+systemd+\cite{website:systemd}).
diff --git a/report/chapters/2-lit-r.tex b/report/chapters/2-lit-r.tex
index 0c1780d..4e26923 100644
--- a/report/chapters/2-lit-r.tex
+++ b/report/chapters/2-lit-r.tex
@@ -5,16 +5,16 @@
The idea of unifying the storage provided by multiple Internet file
storage providers and storing all the content in an encrypted form is
-not new, computer researchers and programmers have devised different
+not new. In the past, computer researchers and programmers have devised different
methods to use multiple file storage providers' storage space. This
chapter gives an overview of the work done by Yeo et al. in unifying
the storage provided by Dropbox, Box, Google Drive and Skydrive on
-Android devices\cite{yeo}(Section \ref{2-yeo-sec}); SkyCDS, a content
+Android devices \cite{yeo}(Section \ref{2-yeo-sec}); SkyCDS, a content
delivery service, by Gonzalez et al., which uses publish/subscribe
overlay paradigm and stores the content across multiple cloud storage
providers such that only part of the content (in encrypted form) is
-stored on each file storage provider\cite{skycds}(Section
-\ref{2-skycds-sec}); lastly, \verb+git-annex+, by Joey
+stored on each file storage provider \cite{skycds}(Section
+\ref{2-skycds-sec}); and, lastly, \verb+git-annex+, by Joey
Hess\cite{person:joeyh}, that allows one to version control and keep
track of large files with a possibility of encrypting files that are
stored in ``special remotes'' -- storage provided by Internet file
@@ -22,13 +22,13 @@ storage providers (Section \ref{2-gitannex-sec}).
\section{Multi Cloud Storage Prototype}\label{2-yeo-sec}
-In their paper ``Leveraging client-side storage techniques for
+In the paper ``Leveraging client-side storage techniques for
enhanced use of multiple consumer cloud storage services on
resource-constrained mobile devices'', Yeo et al. show their Android
mobile application, a prototype, which unifies storage provided by
Dropbox, Box, Google Drive and SkyDrive. The application allows the
user to store all their information in a single location on their
-phone and it uses erasure coding\cite{weatherspoon} to split each file
+phone and it uses erasure coding \cite{weatherspoon} to split each file
into \verb`n + k` fragments and spreads the encrypted fragments across
storage provided by the file storage providers. All basic file
operations -- Create, Rename, Update, Delete (CRUD) -- are
@@ -53,15 +53,15 @@ expensive.
Yeo et al. propose methods for achieving data de-duplication; file
compression based on file type; intelligent pre-fetching
and caching of file fragments and ``automatic restoration in
-exploiting file-versioning''; these features were not implemented in
+exploiting file-versioning''. These features were not implemented in
the prototype Android application and there is possibility of Yeo et
al. implementing these features in the future.
-It becomes apparent that Yeo et al.' work is of immense importance when
-we take into consideration the research done by Yang et al., which
+It becomes apparent that Yeo et al. work is of immense importance. This is particularly true when
+we taking into consideration the research done by Yang et al., which
found that 59\% of the users who use ``cloud storage service'' access
the service through a smart phone and 42.2\% users access it for
-audio/video\cite{yang}. The research by Yang et al. definitely
+audio/video \cite{yang}. The research by Yang et al.
suggests a trend of users' preference for small hand-held computers
over laptops and desktops.
@@ -69,7 +69,7 @@ over laptops and desktops.
SkyCDS, by Gonzalez et al., is a content delivery system that splits
and spreads the content across multiple file storage
-providers\cite{skycds}. According to Gonzalez et al., the main reason
+providers \cite{skycds}. According to Gonzalez et al., the main reason
for designing and developing SkyCDS was to prevent content providers
from getting locked into just one file storage provider and to
minimize loss when a file storage provider goes out of business or if
@@ -89,9 +89,9 @@ responsible for publishing the content using the ``delivery workflow''
``retrieve workflow'' to get access to the subscribed content.
When content has to be dispersed to $k$ file storage providers, the
-content is split into $n$ chunks, $n > k$, this file splitting seems
-to produce 66.7\% of redundancy overhead\cite{skycds}; this file
-splitting scheme looks very similar to erasure coding, but Gonzalez et
+content is split into $n$ chunks, $n > k$. This file splitting seems
+to produce 66.7\% of redundancy overhead \cite{skycds}. This file
+splitting scheme also looks very similar to erasure coding, but Gonzalez et
al. don't explicitly state that the content splitting scheme is indeed
``erasure coding''. The splitting of content is done by the ``delivery
workflow'' engine which is invoked when the publisher triggers the
@@ -110,7 +110,7 @@ space and reliability.
\verb+git-annex+ allows one to version controlled large files that are
not usually feasible to version control under
-\verb+git+\cite{program:git}. \verb+git-annex+, checks in the name
+\verb+git+\cite{program:git}. \verb+git-annex+ checks in the name
and other meta-data about the files in git and stores the actual
content under \verb+.git/annex+ directory. When a file is added to
\verb+git-annex+, a symlink of the file is created in place of the
@@ -148,10 +148,11 @@ add deb-nicholson-80s.medium.webm ok
↳ ls -l
...
-lrwxrwxrwx 1 rsd rsd 207 May 5 2015 deb-nicholson-80s.medium.webm -> ../.git/an
-nex/objects/3j/vG/SHA256E-s108196923--7de9484ee96908268e21b451eb9805552c32b44da08e7
-0ee861332c87352944f.webm/SHA256E-s108196923--7de9484ee96908268e21b451eb9805552c32b4
-4da08e70ee861332c87352944f.webm
+lrwxrwxrwx 1 rsd rsd 207 May 5 2015 deb-nicholson-80s.medium.webm
+-> ../.git/annex/objects/3j/vG/SHA256E-s108196923--7de9484ee96908268e
+21b451eb9805552c32b44da08e70ee861332c87352944f.webm/SHA256E-s10819692
+3--7de9484ee96908268e21b451eb9805552c32b44da08e70ee861332c87352944f.w
+ebm
↳ git commit -m "Added video/deb-nicholson-80s.medium.webm"
[master efa1775] Added video/deb-nicholson-80s.medium.webm
@@ -161,7 +162,7 @@ nex/objects/3j/vG/SHA256E-s108196923--7de9484ee96908268e21b451eb9805552c32b44da0
}
Now, the file \verb+deb-nicholson-80s.medium.webm+ is checked into
-\verb+git-annex+ and we can now do a \verb+git annex sync+ to sync the
+\verb+git-annex+ and the command \verb+git annex sync+ can be issued to sync the
repository to other \verb+git-annex+ repositories. It must be noted
here that when the repository is synced, the file content itself is
not transferred to the other \verb+git-annex+ repositories; only the
@@ -203,12 +204,13 @@ services:
All data pushed to file storage provider's servers can optionally be
encrypted using one's GPG key. For instance, to encrypt data that is
-pushed to the Amazon S3 special remote, following command is
-used\cite{docs:git-annex-as3}:
+pushed to the Amazon S3 special remote, the following command is
+used \cite{docs:git-annex-as3}:
\begin{verbatim}
$ git annex initremote cloud type=S3 keyid=2512E3C7
-initremote cloud (encryption setup with gpg key C910D9222512E3C7) (checking bucket) (creating bucket in US) (gpg) ok
+initremote cloud (encryption setup with gpg key C910D9222512E3C7)
+ (checking bucket) (creating bucket in US) (gpg) ok
$ git annex describe cloud "at Amazon's US datacenter"
describe cloud ok
\end{verbatim}
@@ -220,15 +222,16 @@ size \verb+N+, to do that we do:
\begin{verbatim}
$ git annex initremote cloud type=S3 chunk=1MiB keyid=2512E3C7
-initremote cloud (encryption setup with gpg key C910D9222512E3C7) (checking bucket) (creating bucket in US) (gpg) ok
+initremote cloud (encryption setup with gpg key C910D9222512E3C7)
+ (checking bucket) (creating bucket in US) (gpg) ok
$ git annex describe cloud "at Amazon's US datacenter"
describe cloud ok
\end{verbatim}
-with that each file that has to be pushed to the Amazon S3 special
+Upon completion, each file that has to be pushed to the Amazon S3 special
remote is divided into 1MiB chunks, each chunk is encrypted using the
GPG key \verb+2512E3C7+ and the encrypted chunks are finally pushed to
-the Amazon S3 remote. It is must be noted here that unlike the Multi
+the Amazon S3 remote. It must be noted here that unlike the Multi
Cloud Storage Prototype or SkyCDS or combox, in \verb+git-annex+ when
we are using file chunking all the chunks go to the same location --
in this case, the Amazon S3 remote.
diff --git a/report/chapters/3-arch-d.tex b/report/chapters/3-arch-d.tex
index d7bc632..46cbe4b 100644
--- a/report/chapters/3-arch-d.tex
+++ b/report/chapters/3-arch-d.tex
@@ -11,23 +11,18 @@
combox consists of two main components -- the combox directory and the
node directories. The combox directory is the place where the user
-stores all the files; the node directories are the directories under
+stores all of their files; the node directories are the directories under
which encrypted shards of the files (in the combox directory) are
scattered to. A node directory is the file storage provider's
-directory, for instance, the Dropbox directory and the Google Drive
+directory. For instance, the Dropbox directory and the Google Drive
directory are node directories.
-When a file \verb+humans.txt+ is created in the combox directory,
+When a file, \verb+humans.txt+, is created in the combox directory,
combox splits \verb+humans.txt+ into \verb+N+ shards, where \verb+N+
-is the number of node directories; if there are two node directories
+is the number of node directories. If there are two node directories
(Dropbox directory and Google Drive directory), then 2 shards are
created. Each shard of the file is then encrypted and the encrypted
-shards are spread evenly across the node directories; if there are two
-node directories -- Dropbox directory and Google Drive directory --
-combox will create two encrypted shards of file \verb+humans.txt+ --
-\verb+humans.txt.shard0+, \verb+humans.txt.shard1+ -- and place one
-encrypted shard under the Dropbox directory and the other encrypted
-shard under the Google Drive directory. Now, the Dropbox client and
+shards are spread evenly across the node directories. Now, the Dropbox client and
the Google client will sync the respective shards that was place under
their directories to their respective data store.
@@ -52,28 +47,28 @@ for file modification, deletion and rename/move.
The combox configuration wizard triggers automatically when combox
finds that it is not configured. The combox configuration wizard
-setups up the combox directory; asks the user to point to the location
-of the node directories; reads the key (passphrase) to be used to
+configures the combox directory; asks the user to point to the location
+of the node directories; and reads the key (passphrase) to be used to
encrypt file shards that are spread across the node directories. The
combox configuration is written to
-\verb+$HOME/.combox/config.yaml+; this YAML configuration file can be
+\verb+$HOME/.combox/config.yaml+. This YAML configuration file can be
manually edited by the user.
The
\verb+config_cb+\footnote{https://git.ricketyspace.net/combox/tree/combox/config.py?id=fb7fdd21\#n90}
function in the \verb+combox.config+ module is responsible for
carrying out the combox configuration. Prior to version \verb+0.2.0+,
-the combox configuration was purely done through the CLI, from
-\verb+0.2.0+ on wards, by default, the combox configuration done
+the combox configuration was purely done through the Command Line Interface (CLI). From
+\verb+0.2.0+ on wards, by default, the combox configuration is done
through a graphical interface; it is still possible to configure
combox through the CLI with the \verb+--cli+ switch.
A demo of combox configuration using the graphical interface on
-GNU/Linux can be viewed at
-\url{https://ricketyspace.net/combox/combox-config-gui-glued-gnu.webm};
-the same demo of combox configuration using the graphical interface on
-OS X can be viewed at
-\url{https://ricketyspace.net/combox/combox-config-gui-glued-osx.webm}.
+GNU/Linux can be viewed
+\url{https://ricketyspace.net/combox/combox-config-gui-glued-gnu.webm}{here}.
+T he same demo of combox configuration using the graphical interface on
+OS X can be viewed
+\url{https://ricketyspace.net/combox/combox-config-gui-glued-osx.webm}{here}.
\subsection{combox directory monitor}\label{sec:3-combox-cdirm}
@@ -110,11 +105,11 @@ and store the hash of file under its new name.
Node directory monitor is an instance of
\verb+combox.events.NodeDirMonitor+\footnote{https://git.ricketyspace.net/combox/tree/combox/events.py?id=fb7fdd21\#n352}
-monitoring a node directory. When changes are made to the node
+that monitors a node directory. When changes are made to the node
directory, the node directory monitor is responsible for correctly
detecting the type of change and doing the right thing at that
instance of time. Each node directory has a dedicated node directory
-monitor; if there are 2 node directories, then combox will instantiate
+monitor. If there are 2 node directories, then combox will instantiate
2 node directory monitors.
When an encrypted shard is created in the node directory due to a file
@@ -174,30 +169,30 @@ directory.
\subsection{combox data store}\label{sec:3-combox-db}
-To keep it simple, stupid, combox tracks bare minimum information
-about the files, stored in the combox directory, and depends on file
+To ``keep it simple, stupid'', combox tracks bare minimum information
+about the files that are stored in the combox directory, depending on file
system events to do the right thing when changes takes place in the
combox directory.
-The only information that is stored in the combox data store, about a
-file in the combox directory is its SHA-512 hash; The SHA-512 hash of
+The only information that is stored in the combox data store with regards to a
+file in the combox directory is its SHA-512 hash. The SHA-512 hash of
a file is enough information to detect changes in the file. In the
-data store, there is also four dictionaries -- \verb+file_moved+,
+data store, there are also four dictionaries -- \verb+file_moved+,
\verb+file_deleted+, \verb+file_created+, \verb+file_modified+ --
-which tracks the number of shards of a file that was
+which track the number of shards of a file that wer
moved/deleted/created/modified due the respective file being
-moved/deleted/created/modified on another computer; these four
+moved/deleted/created/modified on another computer. These four
dictionaries are primarily used by the \verb+NodeDirMonitor+ to detect
remote file movement/deletion/creation/modification and triggering
file reconstruction from the encrypted shards at the right time.
-The data store is a JSON file on the disk, stored by default at
+The data store is a JSON file on the disk, stored by default at \\
\verb+$HOME/.combox/silo.db+. The
\verb+combox.silo.ComboxSilo+\footnote{https://git.ricketyspace.net/combox/tree/combox/silo.py?id=v0.2.2\#n29}
is the sole interface to read from and write to the data store. The
data store is primarily accessed and modified by the combox directory
monitor (\verb+ComboxDirMonitor+) and the node directory monitor
-(\verb+NodeDirMonitor+) through a shared \verb+threading.Lock+ that ensures that only
+(\verb+NodeDirMonitor+) through a shared \verb+threading. Lock+ that ensures that only
one entity\footnote{An entity can be the combox directory monitor or
one of the node directory monitors} can access/modify the database
at a time.
@@ -220,7 +215,7 @@ Below is an illustration of the structure of the combox data store:
The \verb+combox.silo.ComboxSilo+, which is the sole interface to read
from and write to the database, uses the pickleDB
-library\cite{pylib:pickledb}. The pickleDB is a very basic key-value
+library \cite{pylib:pickledb}. The pickleDB is a very basic key-value
store which allows one to store information in the JSON format.
It must be noted that the combox data store on each computer is
@@ -229,8 +224,7 @@ combox data store located in other computers.
\section{combox modules overview}
-combox is spread into modules that have functions and/or classes. As
-of \verb+2016-02-04+ combox is considerably a small program:
+combox is spread into modules that have functions and/or classes. Currently, combox is considerably a small program consisting of the following files:
\begin{verbatim}
$ wc -l combox/*.py
@@ -248,7 +242,7 @@ $ wc -l combox/*.py
\end{verbatim}
This section gives an overview of each of the combox modules with
-extreme brevity:
+extreme brevity.
\begin{description}
\item[combox.cbox]\footnote{https://git.ricketyspace.net/combox/tree/combox/cbox.py?id=fb7fdd21}
@@ -334,30 +328,30 @@ spread them across node directories (Google Drive and Dropbox) and
decrypt, glue shards and put them back to the combox directory when a
file is created/modified/deleted/moved in another computer. The plan
was to use external libraries to accomplish things that fell outside
-the realm of the ``core functionality of combox''; the main reason
+the realm of the ``core functionality of combox''. The main reason
behind this decision was to not indulge in trying to solve problems
that others have already solved.
-The \verb+watchdog+\cite{pylib:watchdog} library was chosen for file
-monitoring; this library is compatible with Unix, Unix-like systems
+Accordingly, the \verb+watchdog+\cite{pylib:watchdog} library was chosen for file
+monitoring. This library is compatible with Unix, Unix-like systems
and Microsoft Windows. The \verb+pycrypto+
-library\cite{pylib:pycrypto} was used for encrypting data; combox uses
+library \cite{pylib:pycrypto} was used for encrypting data. Combox uses
AES encryption scheme to encrypt file shards. The
-\verb+pickleDB+\cite{pylib:pickledb} library was used to store
+\verb+pickleDB+ \cite{pylib:pickledb} library was used to store
information about files in the combox directory.
Looking back, the decision to use external libraries reduced the
complexity of combox, reduced the time to complete the initial working
-version of combox and made it possible to spend more than 3 months
+version of combox, and made it possible to spend more than 3 months
just testing and fixing issues in combox.
\section{Operating system compatibility}\label{3-os-compat}
-combox was developed on a GNU/Linux machine, a conscious effort was
-made to write in an operating system independent way. The top criteria
+combox was developed on a GNU/Linux machine. A conscious effort was
+made to write the software in an operating system independent way. The top criteria
for choosing a library to use in combox was that it had to be
compatible on \emph{all} of the three major computing
-platforms\footnote{GNU/Linux, OS X and, Microsoft Windows}.
+platforms \footnote{GNU/Linux, OS X and, Microsoft Windows}.
Prior to the \verb+0.1.0+ release, combox was tested on OS X (See
chapter \ref{ch:4}) and OS X specific issues that were found were
@@ -369,14 +363,14 @@ compatible with Microsoft Windows out of the box. it was found that:
\begin{itemize}
\item Setting up the paraphernalia to run combox was
- non-trivial\cite{doc:combox-setup-windoze}.
-\item The unit tests for the \verb+combox.file+ module royally failed.
+ non-trivial \cite{doc:combox-setup-windoze}.
+\item The unit tests for the \verb+combox.file+ module failed on the Windows Operating System.
\end{itemize}
At the time of writing the report, combox is at version \verb+0.2.3+
and it is not compatible with Microsoft Windows. Comprehensive
documentation for setting up the development environment for combox on
-Microsoft Windows was written\cite{doc:combox-setup-windoze} to make
+Microsoft Windows was written \cite{doc:combox-setup-windoze} to make
it less cumbersome for anyone who would want to work on making combox
compatible with Microsoft Windows.
@@ -401,9 +395,9 @@ Finally install combox with:
python setup.py install
\end{verbatim}
-Python has a package registry called CheeseShop\footnote{code name for
- Python Package Index, see https://wiki.python.org/moin/CheeseShop};
-all packages registered at the CheeseShop can be installed using
+Python has a package registry called CheeseShop \footnote{code name for
+ Python Package Index, see https://wiki.python.org/moin/CheeseShop}.
+All packages registered at the CheeseShop can be installed using
\verb+pip+ -- Python's platform independent package management
system\cite{py:pip} -- with:
@@ -421,7 +415,7 @@ can now easily get a copy of combox on their machine with:
pip install combox
\end{verbatim}
-All versions of combox that is available through the CheeseShop are
+All versions of combox that are available through the CheeseShop are
digitally signed using the following GPG key:
\begin{verbatim}
@@ -433,4 +427,4 @@ sub 4096R/09CECEDB 2014-09-08 [expires: 2017-09-07]
All versions of combox's source are also available as a compressed
\verb+TAR+ ball and as a \verb+ZIP+ archive; they can be downloaded
-from \url{https://ricketyspace.net/combox/releases.html}.
+from \url{https://ricketyspace.net/combox/releases.html}{here}.
diff --git a/report/chapters/4-testing.tex b/report/chapters/4-testing.tex
index fea4cbc..bf57e73 100644
--- a/report/chapters/4-testing.tex
+++ b/report/chapters/4-testing.tex
@@ -5,8 +5,8 @@
\section{Unit testing}\label{sec:4-unit-testing}
-The \verb+nose+\cite{pylib:nose} testing framework was used to write
-unit tests for the functions and classes part of the
+The \verb+nose+ \cite{pylib:nose} testing framework was used to write
+unit tests for the functions and classes that are part of the
\verb+combox.config+, \verb+combox.crypto+, \verb+combox.events+,
\verb+combox.file+, \verb+combox.silo+ and \verb+combox._version+
modules. Unit tests were not written for \verb+combox.cbox+,
@@ -30,17 +30,17 @@ of inputs.
Unit tests greatly helped in testing the compatibility of combox on OS
X. Before the \verb+v0.1.0+ release, combox's node directory monitor
always assumed that a file's first shard (\verb+shard0+) is always
-available; while this assumption did not create any problems on
-GNU/Linux, on OS X, this assumption made the node directory monitor to
-behave erratically -- this issue (bug \#4) was immediately found when
+available. While this assumption did not create any problems on
+GNU/Linux, on OS X this assumption made the node directory monitor to
+behave erratically. This issue (bug \#4) was immediately found when
the unit tests were run for the first time on OS X. Another instance
-where unit tests helped was just before the \verb+v0.2.0+ release;
-major changes, including the introduction of file locks in the
+where unit tests helped was just before the \verb+v0.2.0+ release.
+Major changes, including the introduction of file locks in the
\verb+ComboxDirMonitor+, were made to the \verb+combox.events+. When
the unit tests were run OS X, two tests failed, revealing a difference
in behavior of watchdog\cite{pylib:watchdog} on GNU/Linux and OS X on
file
-creation\footnote{https://git.ricketyspace.net/combox/commit/?id=8c86e7c28738c66c0e04ae7886b44dbcdfc6369exo};
+creation \footnote{https://git.ricketyspace.net/combox/commit/?id=8c86e7c28738c66c0e04ae7886b44dbcdfc6369exo};
without unit tests, there is a high probability that this bug would
never have been found by now.
@@ -59,8 +59,8 @@ these bugs were found when manually testing combox.
\section{Manual testing}\label{sec:4-manual-testing}
The unit tests for the \verb+combox.events+ module tested the
-correctness of the \verb+ComboxDirMonitor+ and \verb+NodeDirMonitor+
-independently; in order to comprehensively test the correctness of
+correctness of the \\ \verb+ComboxDirMonitor+ and \verb+NodeDirMonitor+
+independently. In order to comprehensively test the correctness of
both \verb+ComboxDirMonitor+ and \verb+NodeDirMonitor+, it was
required to manually test combox running on more than one
computer. Several bugs were found and fixed while doing manual
@@ -69,9 +69,9 @@ testing.
Three different types of setups were used to manually test combox. The
first kind of setup has two GNU/Linux machines each using combox to
sync files between each other with Dropbox and Google Drive being the
-nodes; the second kind of setup has a GNU/Linux machine and a OS X
+nodes. The second kind of setup has a GNU/Linux machine and a OS X
machine each using combox to sync files between each other with
-Dropbox and Google Drive being the nodes; the third kind of setup has
+Dropbox and Google Drive being the nodes. The third kind of setup has
a GNU/Linux machine and OS X machine each using combox to sync files
between each other with Dropbox, Google Drive and a USB stick as
nodes.
@@ -100,8 +100,8 @@ combox was run on two GNU/Linux machines and a file was alternatively
created/modified/renamed/deleted on one of the GNU/Linux machine and
it was verified if the respective file was also
created/modified/renamed/deleted on the other GNU/Linux machine. One
-of the GNU/Linux machine (\verb+lyra)+ was a virtual machine running
-Debian GNU/Linux stable (version 8.x); the other GNU/Linux machine
+of the GNU/Linux machines, (\verb+lyra)+, was a virtual machine running
+Debian GNU/Linux stable (version 8.x). The other GNU/Linux machine
(\verb+grus+) was a physical machine running Debian GNU/Linux
testing. The node directories to scatter the files' shards were the
Dropbox directory and Google Drive directory. The official Dropbox
@@ -144,7 +144,7 @@ data store.
\verb+.dropbox.cache+ directory on this computer.
\end{itemize}
- All of the above behavior of the Dropbox client royally broke
+ All of the above behavior of the Dropbox client broke
combox. Commits between \verb+3d714c5+ to
\verb+6e1133f+\footnote{https://git.ricketyspace.net/combox/log/?qt=range\&q=3d714c5..6e1133f}
fixed combox by making it aware of Dropbox's client behavior.
@@ -152,10 +152,8 @@ data store.
\subsubsection{Demo}
-Demo of combox being used on two GNU/Linux machines can be viewed at
-\url{https://ricketyspace.net/combox/combox-2-gnus.webm}.
-
-\verb+lyra+ (virtual machine) and \verb+grus+ (bare-metal) are the two
+A demo of combox being used on two GNU/Linux machines can be viewed at
+\url{https://ricketyspace.net/combox/combox-2-gnus.webm}. \verb+lyra+ (virtual machine) and \verb+grus+ (bare-metal) are the two
GNU/Linux machines being used for the demo.
Description of what happens in the demo follows:
@@ -277,7 +275,7 @@ Google Drive directory to Google Drive's data store on GNU/Linux.
\subsubsection{Demo}
-Demo of combox being used on a GNU/Linux machine and OS X machine can
+A demo of combox being used on a GNU/Linux machine and OS X machine can
be viewed at \url{https://ricketyspace.net/combox/combox-gnu-osx.webm}
\verb+lyra+ is the GNU/Linux (virtual) machine and
@@ -359,7 +357,7 @@ files stored in combox directory.
\subsubsection{Demo}
-Demo of combox being used with a USB stick as the third node can be
+A demo of combox being used with a USB stick as the third node can be
viewed at \url{https://ricketyspace.net/combox/combox-usb-node-demo.webm}
\verb+grus+ is the GNU/Linux machine and \verb+dhcp-129-1-66-1+ is the
@@ -442,29 +440,29 @@ Description of what happens in the demo follows:
\section{Stress testing}
-Large number of files of different sizes were dumped to the combox
+A large number of files of different sizes were dumped to the combox
directory between an one second interval to see how combox responds to
-high load. The file dump size was varied from \verb+424.798190MiB+ (27
-files) to \verb+10800.000000MiB+ (180 files); the average time taken
+high load. The file dump size was varied from \verb+424.80MiB+ (27
+files) to \verb+10,800.00MiB+ (180 files). The average time taken
to split a file and the total time to process all files were
calculated for each dump.
Stress testing was first done on \verb+2015-11-08+. In mid November
-2015, the \verb+ComboxDirMonitor+ was drastically modified to make it
+2015, the \\ \verb+ComboxDirMonitor+ was drastically modified to make it
use the file Lock shared by the instances of
-\verb+NodeDirMonitor+\footnote{https://git.ricketyspace.net/combox/commit/?id=5aa1ba0c1dcad62931ba27bb66bf115233086d6c};
-the hunch was that this change in \verb+ComboxDirMonitor+ directly
+\verb+NodeDirMonitor+\footnote{https://git.ricketyspace.net/combox/commit/?id=5aa1ba0c1dcad62931ba27bb66bf115233086d6c}.
+The hypothesis was that this change in \verb+ComboxDirMonitor+ directly
affected the performance of combox and therefore the results that were
got from stress testing on \verb+2015-11-08+ would no longer be
-valid. Stress testing was again done on \verb+2016-01-16+; the results
+valid. Stress testing was again done on \verb+2016-01-16+. The results
of this stress test are in sections \ref{4-st-424} to
-\ref{4-st-10800}, section \ref{4-st-tu} gives information about the
+\ref{4-st-10800}. Section \ref{4-st-tu} gives information about the
tools used for stress testing, section \ref{4-st-o} contains the
observations and comparisons between this stress test and the one done
-on \verb+2015-11-08+, lastly section \ref{4-st-if} reveals the issues
+on \verb+2015-11-08+, and, lastly section \ref{4-st-if} reveals the issues
that were found with combox by virtue of doing the stress tests.
-\subsection{flac dump (27 files - 424.798190MiB)}\label{4-st-424}
+\subsection{flac dump (27 files - 424.80MiB)}\label{4-st-424}
\begin{center}
\begin{table}[h]
@@ -579,7 +577,7 @@ avg. time to split and encrypt a file & 3423.087539ms\\
\subsection{Tools used}\label{4-st-tu}
The \verb+dump+ script\footnote{https://git.ricketyspace.net/combox-paper/plain/dumper/dump} was used to dump files to
-the combox directory between one second intervals; a night of Emacs
+the combox directory between one second intervals. A night of Emacs
Lisp indulgence made it possible to quickly slurp the required data
from the combox output and calculate the average time to split and
encrypt a file and the total amount of time taken to process the files
diff --git a/report/chapters/5-con-f.tex b/report/chapters/5-con-f.tex
index 0273c5e..f6193ec 100644
--- a/report/chapters/5-con-f.tex
+++ b/report/chapters/5-con-f.tex
@@ -9,15 +9,15 @@
combox is at a stage where it can be used as a tool to use the storage
provided by two file storage providers -- Google Drive and Dropbox --
such that only part of each file in the encrypted form is stored on
-the data store of the file storage providers; this method of storing
-files on file storage providers makes it difficult but not impossible
+the data store of the file storage providers. This method of storing
+files on file storage providers makes it difficult, but not impossible,
for file storage providers or ``third parties'' to gain access to the
user's personal files.
combox is at version 0.2.3, it is a python package licensed under the
GNU General Public License version 3 or later. It is compatible with
GNU/Linux and OS X. The program is considered to be in ``alpha'' stage
-and must be used for experimental use only, it is not recommended to
+and must be used for experimental use only. It is not recommended to
store critical files on storage provided by file storage providers
using combox. Individuals who wish to try combox would want to look at
\url{https://ricketyspace.net/combox/setup/} to get the program
@@ -29,7 +29,7 @@ repository is also mirrored at
\url{https://bitbucket.org/bgsucodeloverslab/combox/src} and
\url{http://rsiddharth.ninth.su/git/cb.git/}.
-There are a lot of things that can be done to improve combox, what
+There are a lot of things that can be done to improve combox, and what
follows is a non-exhaustive list of things to do in the future:
\begin{itemize}
@@ -37,25 +37,29 @@ follows is a non-exhaustive list of things to do in the future:
directory. At the moment, combox reads the amount of free space
available on each node directory (file storage provider's directory)
when configuring combox on a computer but does not use this
- information to reckon the space left in each node directory.
+ information to reckon the space left in each node directory. The major issue here is how to determine what space is available without interacting with a service provider's API or asking the end user.
+
+
\item Re-think \verb+combox.events+ module. This module was written
with the assumption that combox will be the only one to make changes
to the node directories. This assumption was found to be not true
when manually testing combox with node clients (Google Drive and
Dropbox client that sync files to/from the respective node
- directories to/from their respective data stores); both the Google
+ directories to/from their respective data stores). Both the Google
Drive and the Dropbox client make modifications to the Google Drive
and Dropbox directory respectively whenever pulling a modified shard
from their data store to the user's computer, this behavior broke
combox and major changes were made to the \verb+combox.events+
module to make it understand the node client's behavior in the node
- directory; these changes, increased the complexity of the classes
- defined in the \verb+combox.events+; it would be great to re-think
+ directory. These changes increased the complexity of the classes
+ defined in the \verb+combox.events+. Tt would be great to re-think
this module in such a way that it reduces its complexity.
+
\item Evaluate if more information needs to tracked about each file in
- the combox directory; at the moment, combox only keeps track of the
+ the combox directory. At the moment, combox only keeps track of the
SHA-256 hash of each file stored in the combox directory.
-\item Support more file storage providers; for this, ideally no code
+
+\item Support more file storage providers. For this, ideally no code
needs to be written for supporting a new file storage provider,
combox must be tested with the new file storage provider's directory
as a node directory. If the new file storage provider's client (that
@@ -64,24 +68,27 @@ follows is a non-exhaustive list of things to do in the future:
then the \verb+combox.events.NodeDirMonitor+ must be accordingly
updated to make combox cognizant about the file storage provider
client's non-standard behavior.
+
\item Make unit tests more modular. At the moment, there are some unit
test functions that test more than one usecase/facet of a function
- or class; for instance, the \verb+test_CDM+ test method, part of the
+ or class. For instance, the \verb+test_CDM+ test method, part of the
the \verb+tests.events_test.TestEvents+ test class tests the
correctness of the \verb+combox.events.ComboxDirMonitor+ for file
creation, deletion, rename and modification; this method would
ideally broken down into four tests methods.
+
\item Make combox Python 3 compatible. The \verb+2to3+ program (which
is part of the standard Python library since Python version 2.6) and
the \verb+six+ library can be used to achieve this. See Appendix
\ref{a-python3c} for more information on this.
+
\item Support Microsoft Windows. The way to make combox compatible
- with Windows will be to run unit tests on Windows, the failing tests
+ with Windows will be to run unit tests on Windows. The failing tests
might give pointers to what parts of combox needs to be
changed/updated in order for it to be compatible with
Windows. Individuals interested in making combox compatible with
Windows might find
- \url{https://ricketyspace.net/combox/setup/#windows} useful; it
+ \url{https://ricketyspace.net/combox/setup/#windows} useful. It
contains information about setting up the development environment
for combox on Windows.
\end{itemize}