combox-paper

notes and other things concerning combox
git clone git://git.ricketyspace.net/combox-paper.git
Log | Files | Refs

commit b4b007cca5a8d01c784949847339540fdf84be81
parent 59f8cb5c5bb1b41dffa68993c5b142016fc74206
Author: Harrison Renny <rennyh@bgsu.edu>
Date:   Fri, 18 Mar 2016 15:12:04 -0400

Updates and changes based on Dr. Green's feedback.

Diffstat:
report/chapters/1-intr.tex | 47+++++++++++++++++++++++------------------------
report/chapters/2-lit-r.tex | 55+++++++++++++++++++++++++++++--------------------------
report/chapters/3-arch-d.tex | 98+++++++++++++++++++++++++++++++++++++------------------------------------------
report/chapters/4-testing.tex | 62++++++++++++++++++++++++++++++--------------------------------
report/chapters/5-con-f.tex | 33++++++++++++++++++++-------------
5 files changed, 148 insertions(+), 147 deletions(-)

diff --git a/report/chapters/1-intr.tex b/report/chapters/1-intr.tex @@ -8,7 +8,7 @@ data/information on their servers and at the same time there is a lot of evidence of governments and other powerful organizations being able to access information/data stored on the Internet companies' computers\cite{website:wikileaks-spyfiles}. Also, most companies add a -standard clause in their privacy policy that allow them to disclose +standard clause in their privacy policy that allows them to disclose information about users or information stored/created by users to ``third parties'': @@ -21,7 +21,7 @@ information about users or information stored/created by users to Policy\cite{website:dropbox-privacy} \end{quote} -In this type of world, it did be good to have a program that would +In this type of world, it would be good to have a program that would encrypt all the data/information before storing it on the storage provided by Internet companies. combox aims to be one such program which not only encrypts but stores only a part of the encrypted @@ -35,14 +35,14 @@ combox. \section{What is combox?}\label{1-sec-cb} -combox allows the user to store all their files in the ``combox +combox allows the user to store all of their files in the ``combox directory'' and combox picks each file stored in the combox directory, -splits them into N shards, encrypts each of the N shards and spreads -the shards to N node directories. A ``node directory'' is the +splits them into $N$ shards, encrypts each of the $N$ shards and spreads +the shards to $N$ node directories. A ``node directory'' is the directory of the file storage provider (Dropbox directory is a node directory). Figure \ref{fig:1-combox-overview-0}, illustrates how a file called \verb+strunk-white.pdf+ is split, encrypted and spread across -N node directories; shards \verb+strunk-white.pdf.shard0+ to +$N$ node directories; shards \verb+strunk-white.pdf.shard0+ to \verb+strunk-white.pdf.shardN+ are encrypted. \begin{figure}[h] @@ -78,17 +78,17 @@ N node directories; shards \verb+strunk-white.pdf.shard0+ to \end{figure} combox does not sync encrypted shards stored in the node directories -to the respective file storage providers' data store and it depends on +to the respective file storage providers' data store. Instead, it depends on the respective file storage provider's client program to sync the shards. combox can be used on all of the user's computers. For instance, the user can install combox on their second computer and combox will reconstruct the file from the encrypted shards stored in the node -directories into the combox directory on their second computer; figure +directories into the combox directory on their second computer; Fig. \ref{fig:1-combox-overview-1} illustrates this. Here too, combox depends on the client program of the respective file storage provider -to sync shards to/from the file storage provider's data store to/from +to sync shards to/from the file storage provider's data store and to/from the respective node directory on the user's computer. \begin{figure}[h] @@ -131,7 +131,7 @@ and Dropbox. \section{How is combox different from Combo-Box?}\label{1-sec-cb-diff} -Combo-Box by Wesley Vollmar\cite{vollmar-combo-box} was the first +Combo-Box by Wesley Vollmar \cite{vollmar-combo-box} was the first implementation of the idea of storing encrypted shards of a file on storage provided different file storage providers and depending on the file storage provider's client to sync shards to their respective data @@ -143,23 +143,23 @@ enumerated below: runs on GNU/Linux and OS X and is not compatible with Microsoft Windows as of version 0.2.3. \item[File splitting] Combo-Box splits a file into shards based on the - space available on each node directory\cite{vollmar-combo-box}, + space available on each node directory \cite{vollmar-combo-box}, while combox is not yet cognizant about space left on each node - directory and splits the file into N equal shards, where N is equal + directory and splits the file into $N$ equal shards, where $N$ is equal to the number of node directories. \item[User Interface] Combo-Box is a graphical application while - combox is mostly a command-line program; combox's configuration + combox is mostly a command-line program. combox's configuration wizard has a graphical interface. The configuration wizard has a command-line interface too for users who like TUI. \item[Database] Combo-Box uses a traditional SQL database with two tables to keep track of files' shards, files' hash, files' last ``sync time'' and for ``security and stability'' uses stored procedures that retrieve/store information in the - database\cite{vollmar-combo-box}. + database \cite{vollmar-combo-box}. combox on the other hand uses a key-value data store to track the files stored in the combox directory using the pickleDB - library\cite{pylib:pickledb}. The key-value data store is a JSON + library \cite{pylib:pickledb}. The key-value data store is a JSON file and all access to this data store is done through an instance of \verb+combox.silo.ComboxSilo+ class\footnote{https://git.ricketyspace.net/combox/tree/combox/silo.py?id=fb7fdd218\#n29} @@ -171,7 +171,7 @@ enumerated below: create/moved/modified/deleted on another computer. \item[Installation] Combo-Box uses the proprietary - InstallShield\cite{nonfree-installshield} to install the program, + InstallShield \cite{nonfree-installshield} to install the program, setup shortcuts and registry settings\cite{vollmar-combo-box}. combox is a python package, it can either be installed through @@ -186,8 +186,8 @@ enumerated below: must be in the same locations on all the computers. combox stores its configuration at - \verb+$HOME/.combox/config.yaml+; the configuration file is not - shared on computers on which the user runs combox; this makes it + \verb+$HOME/.combox/config.yaml+. The configuration file is not + shared on computers on which the user runs combox. This makes it possible to keep the combox directory and the directories of the file storage providers' (node directories) in different locations on each computer. The configuration file is a YAML file and can be @@ -203,16 +203,15 @@ Installing and running combox is relatively easy for Unix users: $ combox \end{verbatim} -For detailed information on installing combox, see +For detailed information on installing combox, see \\ https://ricketyspace.net/combox/setup/. \subsection{Caveats} -combox is extremely event-driven and depends on file-system events to -do the right thing when a file is created/modified/moved/deleted, so +combox is extremely event-driven and depends on filesystem events to +do the correct action when a file is created/modified/moved/deleted, so the user must make sure to start combox before starting the file storage providers' client programs that sync encrypted shards to the -respective node directories; on GNU/Linux distributions this can be +respective node directories. On GNU/Linux distributions this can be automated through the distribution's start-up system (most GNU/Linux -distributions seem to use \verb+systemd+\cite{website:systemd} these -days). +distributions seem to use \verb+systemd+\cite{website:systemd}). diff --git a/report/chapters/2-lit-r.tex b/report/chapters/2-lit-r.tex @@ -5,16 +5,16 @@ The idea of unifying the storage provided by multiple Internet file storage providers and storing all the content in an encrypted form is -not new, computer researchers and programmers have devised different +not new. In the past, computer researchers and programmers have devised different methods to use multiple file storage providers' storage space. This chapter gives an overview of the work done by Yeo et al. in unifying the storage provided by Dropbox, Box, Google Drive and Skydrive on -Android devices\cite{yeo}(Section \ref{2-yeo-sec}); SkyCDS, a content +Android devices \cite{yeo}(Section \ref{2-yeo-sec}); SkyCDS, a content delivery service, by Gonzalez et al., which uses publish/subscribe overlay paradigm and stores the content across multiple cloud storage providers such that only part of the content (in encrypted form) is -stored on each file storage provider\cite{skycds}(Section -\ref{2-skycds-sec}); lastly, \verb+git-annex+, by Joey +stored on each file storage provider \cite{skycds}(Section +\ref{2-skycds-sec}); and, lastly, \verb+git-annex+, by Joey Hess\cite{person:joeyh}, that allows one to version control and keep track of large files with a possibility of encrypting files that are stored in ``special remotes'' -- storage provided by Internet file @@ -22,13 +22,13 @@ storage providers (Section \ref{2-gitannex-sec}). \section{Multi Cloud Storage Prototype}\label{2-yeo-sec} -In their paper ``Leveraging client-side storage techniques for +In the paper ``Leveraging client-side storage techniques for enhanced use of multiple consumer cloud storage services on resource-constrained mobile devices'', Yeo et al. show their Android mobile application, a prototype, which unifies storage provided by Dropbox, Box, Google Drive and SkyDrive. The application allows the user to store all their information in a single location on their -phone and it uses erasure coding\cite{weatherspoon} to split each file +phone and it uses erasure coding \cite{weatherspoon} to split each file into \verb`n + k` fragments and spreads the encrypted fragments across storage provided by the file storage providers. All basic file operations -- Create, Rename, Update, Delete (CRUD) -- are @@ -53,15 +53,15 @@ expensive. Yeo et al. propose methods for achieving data de-duplication; file compression based on file type; intelligent pre-fetching and caching of file fragments and ``automatic restoration in -exploiting file-versioning''; these features were not implemented in +exploiting file-versioning''. These features were not implemented in the prototype Android application and there is possibility of Yeo et al. implementing these features in the future. -It becomes apparent that Yeo et al.' work is of immense importance when -we take into consideration the research done by Yang et al., which +It becomes apparent that Yeo et al. work is of immense importance. This is particularly true when +we taking into consideration the research done by Yang et al., which found that 59\% of the users who use ``cloud storage service'' access the service through a smart phone and 42.2\% users access it for -audio/video\cite{yang}. The research by Yang et al. definitely +audio/video \cite{yang}. The research by Yang et al. suggests a trend of users' preference for small hand-held computers over laptops and desktops. @@ -69,7 +69,7 @@ over laptops and desktops. SkyCDS, by Gonzalez et al., is a content delivery system that splits and spreads the content across multiple file storage -providers\cite{skycds}. According to Gonzalez et al., the main reason +providers \cite{skycds}. According to Gonzalez et al., the main reason for designing and developing SkyCDS was to prevent content providers from getting locked into just one file storage provider and to minimize loss when a file storage provider goes out of business or if @@ -89,9 +89,9 @@ responsible for publishing the content using the ``delivery workflow'' ``retrieve workflow'' to get access to the subscribed content. When content has to be dispersed to $k$ file storage providers, the -content is split into $n$ chunks, $n > k$, this file splitting seems -to produce 66.7\% of redundancy overhead\cite{skycds}; this file -splitting scheme looks very similar to erasure coding, but Gonzalez et +content is split into $n$ chunks, $n > k$. This file splitting seems +to produce 66.7\% of redundancy overhead \cite{skycds}. This file +splitting scheme also looks very similar to erasure coding, but Gonzalez et al. don't explicitly state that the content splitting scheme is indeed ``erasure coding''. The splitting of content is done by the ``delivery workflow'' engine which is invoked when the publisher triggers the @@ -110,7 +110,7 @@ space and reliability. \verb+git-annex+ allows one to version controlled large files that are not usually feasible to version control under -\verb+git+\cite{program:git}. \verb+git-annex+, checks in the name +\verb+git+\cite{program:git}. \verb+git-annex+ checks in the name and other meta-data about the files in git and stores the actual content under \verb+.git/annex+ directory. When a file is added to \verb+git-annex+, a symlink of the file is created in place of the @@ -148,10 +148,11 @@ add deb-nicholson-80s.medium.webm ok ↳ ls -l ... -lrwxrwxrwx 1 rsd rsd 207 May 5 2015 deb-nicholson-80s.medium.webm -> ../.git/an -nex/objects/3j/vG/SHA256E-s108196923--7de9484ee96908268e21b451eb9805552c32b44da08e7 -0ee861332c87352944f.webm/SHA256E-s108196923--7de9484ee96908268e21b451eb9805552c32b4 -4da08e70ee861332c87352944f.webm +lrwxrwxrwx 1 rsd rsd 207 May 5 2015 deb-nicholson-80s.medium.webm +-> ../.git/annex/objects/3j/vG/SHA256E-s108196923--7de9484ee96908268e +21b451eb9805552c32b44da08e70ee861332c87352944f.webm/SHA256E-s10819692 +3--7de9484ee96908268e21b451eb9805552c32b44da08e70ee861332c87352944f.w +ebm ↳ git commit -m "Added video/deb-nicholson-80s.medium.webm" [master efa1775] Added video/deb-nicholson-80s.medium.webm @@ -161,7 +162,7 @@ nex/objects/3j/vG/SHA256E-s108196923--7de9484ee96908268e21b451eb9805552c32b44da0 } Now, the file \verb+deb-nicholson-80s.medium.webm+ is checked into -\verb+git-annex+ and we can now do a \verb+git annex sync+ to sync the +\verb+git-annex+ and the command \verb+git annex sync+ can be issued to sync the repository to other \verb+git-annex+ repositories. It must be noted here that when the repository is synced, the file content itself is not transferred to the other \verb+git-annex+ repositories; only the @@ -203,12 +204,13 @@ services: All data pushed to file storage provider's servers can optionally be encrypted using one's GPG key. For instance, to encrypt data that is -pushed to the Amazon S3 special remote, following command is -used\cite{docs:git-annex-as3}: +pushed to the Amazon S3 special remote, the following command is +used \cite{docs:git-annex-as3}: \begin{verbatim} $ git annex initremote cloud type=S3 keyid=2512E3C7 -initremote cloud (encryption setup with gpg key C910D9222512E3C7) (checking bucket) (creating bucket in US) (gpg) ok +initremote cloud (encryption setup with gpg key C910D9222512E3C7) + (checking bucket) (creating bucket in US) (gpg) ok $ git annex describe cloud "at Amazon's US datacenter" describe cloud ok \end{verbatim} @@ -220,15 +222,16 @@ size \verb+N+, to do that we do: \begin{verbatim} $ git annex initremote cloud type=S3 chunk=1MiB keyid=2512E3C7 -initremote cloud (encryption setup with gpg key C910D9222512E3C7) (checking bucket) (creating bucket in US) (gpg) ok +initremote cloud (encryption setup with gpg key C910D9222512E3C7) + (checking bucket) (creating bucket in US) (gpg) ok $ git annex describe cloud "at Amazon's US datacenter" describe cloud ok \end{verbatim} -with that each file that has to be pushed to the Amazon S3 special +Upon completion, each file that has to be pushed to the Amazon S3 special remote is divided into 1MiB chunks, each chunk is encrypted using the GPG key \verb+2512E3C7+ and the encrypted chunks are finally pushed to -the Amazon S3 remote. It is must be noted here that unlike the Multi +the Amazon S3 remote. It must be noted here that unlike the Multi Cloud Storage Prototype or SkyCDS or combox, in \verb+git-annex+ when we are using file chunking all the chunks go to the same location -- in this case, the Amazon S3 remote. diff --git a/report/chapters/3-arch-d.tex b/report/chapters/3-arch-d.tex @@ -11,23 +11,18 @@ combox consists of two main components -- the combox directory and the node directories. The combox directory is the place where the user -stores all the files; the node directories are the directories under +stores all of their files; the node directories are the directories under which encrypted shards of the files (in the combox directory) are scattered to. A node directory is the file storage provider's -directory, for instance, the Dropbox directory and the Google Drive +directory. For instance, the Dropbox directory and the Google Drive directory are node directories. -When a file \verb+humans.txt+ is created in the combox directory, +When a file, \verb+humans.txt+, is created in the combox directory, combox splits \verb+humans.txt+ into \verb+N+ shards, where \verb+N+ -is the number of node directories; if there are two node directories +is the number of node directories. If there are two node directories (Dropbox directory and Google Drive directory), then 2 shards are created. Each shard of the file is then encrypted and the encrypted -shards are spread evenly across the node directories; if there are two -node directories -- Dropbox directory and Google Drive directory -- -combox will create two encrypted shards of file \verb+humans.txt+ -- -\verb+humans.txt.shard0+, \verb+humans.txt.shard1+ -- and place one -encrypted shard under the Dropbox directory and the other encrypted -shard under the Google Drive directory. Now, the Dropbox client and +shards are spread evenly across the node directories. Now, the Dropbox client and the Google client will sync the respective shards that was place under their directories to their respective data store. @@ -52,28 +47,28 @@ for file modification, deletion and rename/move. The combox configuration wizard triggers automatically when combox finds that it is not configured. The combox configuration wizard -setups up the combox directory; asks the user to point to the location -of the node directories; reads the key (passphrase) to be used to +configures the combox directory; asks the user to point to the location +of the node directories; and reads the key (passphrase) to be used to encrypt file shards that are spread across the node directories. The combox configuration is written to -\verb+$HOME/.combox/config.yaml+; this YAML configuration file can be +\verb+$HOME/.combox/config.yaml+. This YAML configuration file can be manually edited by the user. The \verb+config_cb+\footnote{https://git.ricketyspace.net/combox/tree/combox/config.py?id=fb7fdd21\#n90} function in the \verb+combox.config+ module is responsible for carrying out the combox configuration. Prior to version \verb+0.2.0+, -the combox configuration was purely done through the CLI, from -\verb+0.2.0+ on wards, by default, the combox configuration done +the combox configuration was purely done through the Command Line Interface (CLI). From +\verb+0.2.0+ on wards, by default, the combox configuration is done through a graphical interface; it is still possible to configure combox through the CLI with the \verb+--cli+ switch. A demo of combox configuration using the graphical interface on -GNU/Linux can be viewed at -\url{https://ricketyspace.net/combox/combox-config-gui-glued-gnu.webm}; -the same demo of combox configuration using the graphical interface on -OS X can be viewed at -\url{https://ricketyspace.net/combox/combox-config-gui-glued-osx.webm}. +GNU/Linux can be viewed +\url{https://ricketyspace.net/combox/combox-config-gui-glued-gnu.webm}{here}. +T he same demo of combox configuration using the graphical interface on +OS X can be viewed +\url{https://ricketyspace.net/combox/combox-config-gui-glued-osx.webm}{here}. \subsection{combox directory monitor}\label{sec:3-combox-cdirm} @@ -110,11 +105,11 @@ and store the hash of file under its new name. Node directory monitor is an instance of \verb+combox.events.NodeDirMonitor+\footnote{https://git.ricketyspace.net/combox/tree/combox/events.py?id=fb7fdd21\#n352} -monitoring a node directory. When changes are made to the node +that monitors a node directory. When changes are made to the node directory, the node directory monitor is responsible for correctly detecting the type of change and doing the right thing at that instance of time. Each node directory has a dedicated node directory -monitor; if there are 2 node directories, then combox will instantiate +monitor. If there are 2 node directories, then combox will instantiate 2 node directory monitors. When an encrypted shard is created in the node directory due to a file @@ -174,30 +169,30 @@ directory. \subsection{combox data store}\label{sec:3-combox-db} -To keep it simple, stupid, combox tracks bare minimum information -about the files, stored in the combox directory, and depends on file +To ``keep it simple, stupid'', combox tracks bare minimum information +about the files that are stored in the combox directory, depending on file system events to do the right thing when changes takes place in the combox directory. -The only information that is stored in the combox data store, about a -file in the combox directory is its SHA-512 hash; The SHA-512 hash of +The only information that is stored in the combox data store with regards to a +file in the combox directory is its SHA-512 hash. The SHA-512 hash of a file is enough information to detect changes in the file. In the -data store, there is also four dictionaries -- \verb+file_moved+, +data store, there are also four dictionaries -- \verb+file_moved+, \verb+file_deleted+, \verb+file_created+, \verb+file_modified+ -- -which tracks the number of shards of a file that was +which track the number of shards of a file that wer moved/deleted/created/modified due the respective file being -moved/deleted/created/modified on another computer; these four +moved/deleted/created/modified on another computer. These four dictionaries are primarily used by the \verb+NodeDirMonitor+ to detect remote file movement/deletion/creation/modification and triggering file reconstruction from the encrypted shards at the right time. -The data store is a JSON file on the disk, stored by default at +The data store is a JSON file on the disk, stored by default at \\ \verb+$HOME/.combox/silo.db+. The \verb+combox.silo.ComboxSilo+\footnote{https://git.ricketyspace.net/combox/tree/combox/silo.py?id=v0.2.2\#n29} is the sole interface to read from and write to the data store. The data store is primarily accessed and modified by the combox directory monitor (\verb+ComboxDirMonitor+) and the node directory monitor -(\verb+NodeDirMonitor+) through a shared \verb+threading.Lock+ that ensures that only +(\verb+NodeDirMonitor+) through a shared \verb+threading. Lock+ that ensures that only one entity\footnote{An entity can be the combox directory monitor or one of the node directory monitors} can access/modify the database at a time. @@ -220,7 +215,7 @@ Below is an illustration of the structure of the combox data store: The \verb+combox.silo.ComboxSilo+, which is the sole interface to read from and write to the database, uses the pickleDB -library\cite{pylib:pickledb}. The pickleDB is a very basic key-value +library \cite{pylib:pickledb}. The pickleDB is a very basic key-value store which allows one to store information in the JSON format. It must be noted that the combox data store on each computer is @@ -229,8 +224,7 @@ combox data store located in other computers. \section{combox modules overview} -combox is spread into modules that have functions and/or classes. As -of \verb+2016-02-04+ combox is considerably a small program: +combox is spread into modules that have functions and/or classes. Currently, combox is considerably a small program consisting of the following files: \begin{verbatim} $ wc -l combox/*.py @@ -248,7 +242,7 @@ $ wc -l combox/*.py \end{verbatim} This section gives an overview of each of the combox modules with -extreme brevity: +extreme brevity. \begin{description} \item[combox.cbox]\footnote{https://git.ricketyspace.net/combox/tree/combox/cbox.py?id=fb7fdd21} @@ -334,30 +328,30 @@ spread them across node directories (Google Drive and Dropbox) and decrypt, glue shards and put them back to the combox directory when a file is created/modified/deleted/moved in another computer. The plan was to use external libraries to accomplish things that fell outside -the realm of the ``core functionality of combox''; the main reason +the realm of the ``core functionality of combox''. The main reason behind this decision was to not indulge in trying to solve problems that others have already solved. -The \verb+watchdog+\cite{pylib:watchdog} library was chosen for file -monitoring; this library is compatible with Unix, Unix-like systems +Accordingly, the \verb+watchdog+\cite{pylib:watchdog} library was chosen for file +monitoring. This library is compatible with Unix, Unix-like systems and Microsoft Windows. The \verb+pycrypto+ -library\cite{pylib:pycrypto} was used for encrypting data; combox uses +library \cite{pylib:pycrypto} was used for encrypting data. Combox uses AES encryption scheme to encrypt file shards. The -\verb+pickleDB+\cite{pylib:pickledb} library was used to store +\verb+pickleDB+ \cite{pylib:pickledb} library was used to store information about files in the combox directory. Looking back, the decision to use external libraries reduced the complexity of combox, reduced the time to complete the initial working -version of combox and made it possible to spend more than 3 months +version of combox, and made it possible to spend more than 3 months just testing and fixing issues in combox. \section{Operating system compatibility}\label{3-os-compat} -combox was developed on a GNU/Linux machine, a conscious effort was -made to write in an operating system independent way. The top criteria +combox was developed on a GNU/Linux machine. A conscious effort was +made to write the software in an operating system independent way. The top criteria for choosing a library to use in combox was that it had to be compatible on \emph{all} of the three major computing -platforms\footnote{GNU/Linux, OS X and, Microsoft Windows}. +platforms \footnote{GNU/Linux, OS X and, Microsoft Windows}. Prior to the \verb+0.1.0+ release, combox was tested on OS X (See chapter \ref{ch:4}) and OS X specific issues that were found were @@ -369,14 +363,14 @@ compatible with Microsoft Windows out of the box. it was found that: \begin{itemize} \item Setting up the paraphernalia to run combox was - non-trivial\cite{doc:combox-setup-windoze}. -\item The unit tests for the \verb+combox.file+ module royally failed. + non-trivial \cite{doc:combox-setup-windoze}. +\item The unit tests for the \verb+combox.file+ module failed on the Windows Operating System. \end{itemize} At the time of writing the report, combox is at version \verb+0.2.3+ and it is not compatible with Microsoft Windows. Comprehensive documentation for setting up the development environment for combox on -Microsoft Windows was written\cite{doc:combox-setup-windoze} to make +Microsoft Windows was written \cite{doc:combox-setup-windoze} to make it less cumbersome for anyone who would want to work on making combox compatible with Microsoft Windows. @@ -401,9 +395,9 @@ Finally install combox with: python setup.py install \end{verbatim} -Python has a package registry called CheeseShop\footnote{code name for - Python Package Index, see https://wiki.python.org/moin/CheeseShop}; -all packages registered at the CheeseShop can be installed using +Python has a package registry called CheeseShop \footnote{code name for + Python Package Index, see https://wiki.python.org/moin/CheeseShop}. +All packages registered at the CheeseShop can be installed using \verb+pip+ -- Python's platform independent package management system\cite{py:pip} -- with: @@ -421,7 +415,7 @@ can now easily get a copy of combox on their machine with: pip install combox \end{verbatim} -All versions of combox that is available through the CheeseShop are +All versions of combox that are available through the CheeseShop are digitally signed using the following GPG key: \begin{verbatim} @@ -433,4 +427,4 @@ sub 4096R/09CECEDB 2014-09-08 [expires: 2017-09-07] All versions of combox's source are also available as a compressed \verb+TAR+ ball and as a \verb+ZIP+ archive; they can be downloaded -from \url{https://ricketyspace.net/combox/releases.html}. +from \url{https://ricketyspace.net/combox/releases.html}{here}. diff --git a/report/chapters/4-testing.tex b/report/chapters/4-testing.tex @@ -5,8 +5,8 @@ \section{Unit testing}\label{sec:4-unit-testing} -The \verb+nose+\cite{pylib:nose} testing framework was used to write -unit tests for the functions and classes part of the +The \verb+nose+ \cite{pylib:nose} testing framework was used to write +unit tests for the functions and classes that are part of the \verb+combox.config+, \verb+combox.crypto+, \verb+combox.events+, \verb+combox.file+, \verb+combox.silo+ and \verb+combox._version+ modules. Unit tests were not written for \verb+combox.cbox+, @@ -30,17 +30,17 @@ of inputs. Unit tests greatly helped in testing the compatibility of combox on OS X. Before the \verb+v0.1.0+ release, combox's node directory monitor always assumed that a file's first shard (\verb+shard0+) is always -available; while this assumption did not create any problems on -GNU/Linux, on OS X, this assumption made the node directory monitor to -behave erratically -- this issue (bug \#4) was immediately found when +available. While this assumption did not create any problems on +GNU/Linux, on OS X this assumption made the node directory monitor to +behave erratically. This issue (bug \#4) was immediately found when the unit tests were run for the first time on OS X. Another instance -where unit tests helped was just before the \verb+v0.2.0+ release; -major changes, including the introduction of file locks in the +where unit tests helped was just before the \verb+v0.2.0+ release. +Major changes, including the introduction of file locks in the \verb+ComboxDirMonitor+, were made to the \verb+combox.events+. When the unit tests were run OS X, two tests failed, revealing a difference in behavior of watchdog\cite{pylib:watchdog} on GNU/Linux and OS X on file -creation\footnote{https://git.ricketyspace.net/combox/commit/?id=8c86e7c28738c66c0e04ae7886b44dbcdfc6369exo}; +creation \footnote{https://git.ricketyspace.net/combox/commit/?id=8c86e7c28738c66c0e04ae7886b44dbcdfc6369exo}; without unit tests, there is a high probability that this bug would never have been found by now. @@ -59,8 +59,8 @@ these bugs were found when manually testing combox. \section{Manual testing}\label{sec:4-manual-testing} The unit tests for the \verb+combox.events+ module tested the -correctness of the \verb+ComboxDirMonitor+ and \verb+NodeDirMonitor+ -independently; in order to comprehensively test the correctness of +correctness of the \\ \verb+ComboxDirMonitor+ and \verb+NodeDirMonitor+ +independently. In order to comprehensively test the correctness of both \verb+ComboxDirMonitor+ and \verb+NodeDirMonitor+, it was required to manually test combox running on more than one computer. Several bugs were found and fixed while doing manual @@ -69,9 +69,9 @@ testing. Three different types of setups were used to manually test combox. The first kind of setup has two GNU/Linux machines each using combox to sync files between each other with Dropbox and Google Drive being the -nodes; the second kind of setup has a GNU/Linux machine and a OS X +nodes. The second kind of setup has a GNU/Linux machine and a OS X machine each using combox to sync files between each other with -Dropbox and Google Drive being the nodes; the third kind of setup has +Dropbox and Google Drive being the nodes. The third kind of setup has a GNU/Linux machine and OS X machine each using combox to sync files between each other with Dropbox, Google Drive and a USB stick as nodes. @@ -100,8 +100,8 @@ combox was run on two GNU/Linux machines and a file was alternatively created/modified/renamed/deleted on one of the GNU/Linux machine and it was verified if the respective file was also created/modified/renamed/deleted on the other GNU/Linux machine. One -of the GNU/Linux machine (\verb+lyra)+ was a virtual machine running -Debian GNU/Linux stable (version 8.x); the other GNU/Linux machine +of the GNU/Linux machines, (\verb+lyra)+, was a virtual machine running +Debian GNU/Linux stable (version 8.x). The other GNU/Linux machine (\verb+grus+) was a physical machine running Debian GNU/Linux testing. The node directories to scatter the files' shards were the Dropbox directory and Google Drive directory. The official Dropbox @@ -144,7 +144,7 @@ data store. \verb+.dropbox.cache+ directory on this computer. \end{itemize} - All of the above behavior of the Dropbox client royally broke + All of the above behavior of the Dropbox client broke combox. Commits between \verb+3d714c5+ to \verb+6e1133f+\footnote{https://git.ricketyspace.net/combox/log/?qt=range\&q=3d714c5..6e1133f} fixed combox by making it aware of Dropbox's client behavior. @@ -152,10 +152,8 @@ data store. \subsubsection{Demo} -Demo of combox being used on two GNU/Linux machines can be viewed at -\url{https://ricketyspace.net/combox/combox-2-gnus.webm}. - -\verb+lyra+ (virtual machine) and \verb+grus+ (bare-metal) are the two +A demo of combox being used on two GNU/Linux machines can be viewed at +\url{https://ricketyspace.net/combox/combox-2-gnus.webm}. \verb+lyra+ (virtual machine) and \verb+grus+ (bare-metal) are the two GNU/Linux machines being used for the demo. Description of what happens in the demo follows: @@ -277,7 +275,7 @@ Google Drive directory to Google Drive's data store on GNU/Linux. \subsubsection{Demo} -Demo of combox being used on a GNU/Linux machine and OS X machine can +A demo of combox being used on a GNU/Linux machine and OS X machine can be viewed at \url{https://ricketyspace.net/combox/combox-gnu-osx.webm} \verb+lyra+ is the GNU/Linux (virtual) machine and @@ -359,7 +357,7 @@ files stored in combox directory. \subsubsection{Demo} -Demo of combox being used with a USB stick as the third node can be +A demo of combox being used with a USB stick as the third node can be viewed at \url{https://ricketyspace.net/combox/combox-usb-node-demo.webm} \verb+grus+ is the GNU/Linux machine and \verb+dhcp-129-1-66-1+ is the @@ -442,29 +440,29 @@ Description of what happens in the demo follows: \section{Stress testing} -Large number of files of different sizes were dumped to the combox +A large number of files of different sizes were dumped to the combox directory between an one second interval to see how combox responds to -high load. The file dump size was varied from \verb+424.798190MiB+ (27 -files) to \verb+10800.000000MiB+ (180 files); the average time taken +high load. The file dump size was varied from \verb+424.80MiB+ (27 +files) to \verb+10,800.00MiB+ (180 files). The average time taken to split a file and the total time to process all files were calculated for each dump. Stress testing was first done on \verb+2015-11-08+. In mid November -2015, the \verb+ComboxDirMonitor+ was drastically modified to make it +2015, the \\ \verb+ComboxDirMonitor+ was drastically modified to make it use the file Lock shared by the instances of -\verb+NodeDirMonitor+\footnote{https://git.ricketyspace.net/combox/commit/?id=5aa1ba0c1dcad62931ba27bb66bf115233086d6c}; -the hunch was that this change in \verb+ComboxDirMonitor+ directly +\verb+NodeDirMonitor+\footnote{https://git.ricketyspace.net/combox/commit/?id=5aa1ba0c1dcad62931ba27bb66bf115233086d6c}. +The hypothesis was that this change in \verb+ComboxDirMonitor+ directly affected the performance of combox and therefore the results that were got from stress testing on \verb+2015-11-08+ would no longer be -valid. Stress testing was again done on \verb+2016-01-16+; the results +valid. Stress testing was again done on \verb+2016-01-16+. The results of this stress test are in sections \ref{4-st-424} to -\ref{4-st-10800}, section \ref{4-st-tu} gives information about the +\ref{4-st-10800}. Section \ref{4-st-tu} gives information about the tools used for stress testing, section \ref{4-st-o} contains the observations and comparisons between this stress test and the one done -on \verb+2015-11-08+, lastly section \ref{4-st-if} reveals the issues +on \verb+2015-11-08+, and, lastly section \ref{4-st-if} reveals the issues that were found with combox by virtue of doing the stress tests. -\subsection{flac dump (27 files - 424.798190MiB)}\label{4-st-424} +\subsection{flac dump (27 files - 424.80MiB)}\label{4-st-424} \begin{center} \begin{table}[h] @@ -579,7 +577,7 @@ avg. time to split and encrypt a file & 3423.087539ms\\ \subsection{Tools used}\label{4-st-tu} The \verb+dump+ script\footnote{https://git.ricketyspace.net/combox-paper/plain/dumper/dump} was used to dump files to -the combox directory between one second intervals; a night of Emacs +the combox directory between one second intervals. A night of Emacs Lisp indulgence made it possible to quickly slurp the required data from the combox output and calculate the average time to split and encrypt a file and the total amount of time taken to process the files diff --git a/report/chapters/5-con-f.tex b/report/chapters/5-con-f.tex @@ -9,15 +9,15 @@ combox is at a stage where it can be used as a tool to use the storage provided by two file storage providers -- Google Drive and Dropbox -- such that only part of each file in the encrypted form is stored on -the data store of the file storage providers; this method of storing -files on file storage providers makes it difficult but not impossible +the data store of the file storage providers. This method of storing +files on file storage providers makes it difficult, but not impossible, for file storage providers or ``third parties'' to gain access to the user's personal files. combox is at version 0.2.3, it is a python package licensed under the GNU General Public License version 3 or later. It is compatible with GNU/Linux and OS X. The program is considered to be in ``alpha'' stage -and must be used for experimental use only, it is not recommended to +and must be used for experimental use only. It is not recommended to store critical files on storage provided by file storage providers using combox. Individuals who wish to try combox would want to look at \url{https://ricketyspace.net/combox/setup/} to get the program @@ -29,7 +29,7 @@ repository is also mirrored at \url{https://bitbucket.org/bgsucodeloverslab/combox/src} and \url{http://rsiddharth.ninth.su/git/cb.git/}. -There are a lot of things that can be done to improve combox, what +There are a lot of things that can be done to improve combox, and what follows is a non-exhaustive list of things to do in the future: \begin{itemize} @@ -37,25 +37,29 @@ follows is a non-exhaustive list of things to do in the future: directory. At the moment, combox reads the amount of free space available on each node directory (file storage provider's directory) when configuring combox on a computer but does not use this - information to reckon the space left in each node directory. + information to reckon the space left in each node directory. The major issue here is how to determine what space is available without interacting with a service provider's API or asking the end user. + + \item Re-think \verb+combox.events+ module. This module was written with the assumption that combox will be the only one to make changes to the node directories. This assumption was found to be not true when manually testing combox with node clients (Google Drive and Dropbox client that sync files to/from the respective node - directories to/from their respective data stores); both the Google + directories to/from their respective data stores). Both the Google Drive and the Dropbox client make modifications to the Google Drive and Dropbox directory respectively whenever pulling a modified shard from their data store to the user's computer, this behavior broke combox and major changes were made to the \verb+combox.events+ module to make it understand the node client's behavior in the node - directory; these changes, increased the complexity of the classes - defined in the \verb+combox.events+; it would be great to re-think + directory. These changes increased the complexity of the classes + defined in the \verb+combox.events+. Tt would be great to re-think this module in such a way that it reduces its complexity. + \item Evaluate if more information needs to tracked about each file in - the combox directory; at the moment, combox only keeps track of the + the combox directory. At the moment, combox only keeps track of the SHA-256 hash of each file stored in the combox directory. -\item Support more file storage providers; for this, ideally no code + +\item Support more file storage providers. For this, ideally no code needs to be written for supporting a new file storage provider, combox must be tested with the new file storage provider's directory as a node directory. If the new file storage provider's client (that @@ -64,24 +68,27 @@ follows is a non-exhaustive list of things to do in the future: then the \verb+combox.events.NodeDirMonitor+ must be accordingly updated to make combox cognizant about the file storage provider client's non-standard behavior. + \item Make unit tests more modular. At the moment, there are some unit test functions that test more than one usecase/facet of a function - or class; for instance, the \verb+test_CDM+ test method, part of the + or class. For instance, the \verb+test_CDM+ test method, part of the the \verb+tests.events_test.TestEvents+ test class tests the correctness of the \verb+combox.events.ComboxDirMonitor+ for file creation, deletion, rename and modification; this method would ideally broken down into four tests methods. + \item Make combox Python 3 compatible. The \verb+2to3+ program (which is part of the standard Python library since Python version 2.6) and the \verb+six+ library can be used to achieve this. See Appendix \ref{a-python3c} for more information on this. + \item Support Microsoft Windows. The way to make combox compatible - with Windows will be to run unit tests on Windows, the failing tests + with Windows will be to run unit tests on Windows. The failing tests might give pointers to what parts of combox needs to be changed/updated in order for it to be compatible with Windows. Individuals interested in making combox compatible with Windows might find - \url{https://ricketyspace.net/combox/setup/#windows} useful; it + \url{https://ricketyspace.net/combox/setup/#windows} useful. It contains information about setting up the development environment for combox on Windows. \end{itemize}