From 2c136728999d7451d8eef2f202a08ec7bc524136 Mon Sep 17 00:00:00 2001 From: Siddharth Ravikumar Date: Fri, 26 Feb 2016 08:36:26 -0500 Subject: Moved around chapters. Chapter 3 -> Chapter 2 Chapter 4 -> Chapter 3 Chapter 5 -> Chapter 4 --- report/chapters/3-arch-d.tex | 504 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 504 insertions(+) create mode 100644 report/chapters/3-arch-d.tex (limited to 'report/chapters/3-arch-d.tex') diff --git a/report/chapters/3-arch-d.tex b/report/chapters/3-arch-d.tex new file mode 100644 index 0000000..59997e9 --- /dev/null +++ b/report/chapters/3-arch-d.tex @@ -0,0 +1,504 @@ +\chapter{Architecture and Design} + +\epigraph{In general, when modeling phenomena in science and + engineering, we begin with simplified, incomplete models. As we + examine things in greater detail, these simple models become + inadequate and must be replaced by more refined + models.}{\textit{Structure and Interpretation of Computer Programs, + Section 1.1.5}\cite{sicp}} + +\section{Structure of combox} + +combox consists of two main components -- the combox directory and the +node directories. The combox directory is the place where the user +stores all her files; the node directories are the directories under +which encrypted shards of the files (in the combox directory) are +scattered to. A node directory is the file storage provider's +directory, for instance, the Dropbox directory and the Google Drive +directory are node directories. + +When a file \verb+file.ext+ is created in the combox directory, combox +splits the \verb+file.ext+ into \verb+N+ shards, where \verb+N+ is the +number of node directories; if there are two node directories (Dropbox +directory and Google Drive driver), then 2 shards are created. Each +shard of the file is then encrypted and the encrypted shards are +spread evenly across the node directories; if there are two node +directories -- Dropbox directory and Google Drive directory -- combox +will create two encrypted shards of file \verb+file.ext+ -- +\verb+file.ext.shard0+, \verb+file.ext.shard1+ -- and place one +encrypted shard under the Dropbox directory and the other encrypted +shard under the Google Drive directory. Now, the Dropbox client and +the Google client will sync the respective shards that was place under +their directories to their servers. + +\begin{figure}[h] +\includegraphics[scale=0.6]{4-combox-structure} +\caption{High level view of combox on two computers.} +\label{fig:4-combox-structure} +\end{figure} + +Now, we can move to another computer and start combox on it. First, +the node clients (Dropbox client and the Google Drive client) will +sync the new encrypted shards to their respective directories. Once +the encrypted shards are synced to the node directories, combox will +pick the encrypted shards -- \verb+file.ext.shard0+, +\verb+file.ext.shard1+ -- decrypt them and reconstruct into +\verb+file.ext+ and place in the respective location under the combox +directory; figure \ref{fig:4-combox-structure} illustrates this. The +process is similar for file modification, deletion and rename/move. + +\subsection{combox configuration}\label{sec:4-combox-config} + +combox configuration triggers automatically when combox finds that it +is not configured on this computer. The combox configuration setups up +the combox directory; asks the user to point to the location of the +node directories; reads the key (passphrase) to be used to encrypt +file shards that are spread across the node directories. The combox +configuration is written to +\verb+$HOME/.combox/config.yaml+; this YAML configuration file can be +manually edited by the user. + +The \verb+config_cb+ function in the \verb+combox.config+ module is +responsible for carrying out the combox configuration. Prior to +version \verb+0.2.0+, the combox configuration was purely done through +the CLI, from \verb+0.2.0+ onwards, be default, the combox +configuration done through a graphical interface; it is still possible +to configure combox through the CLI with the \verb+--cli+ switch. + +A demo of combox configuration using the graphical interface on +GNU/Linux can be viewed at +\url{https://ricketyspace.net/combox/combox-config-gui-glued-gnu.webm}; +the same demo of combox configuration using the graphical interface on +OS X can be viewed at +\url{https://ricketyspace.net/combox/combox-config-gui-glued-osx.webm}. + +\subsection{combox directory monitor}\label{sec:4-combox-cdirm} + +combox directory monitor is an instance of +\verb+combox.events.ComboxDirMonitor+ monitoring the combox directory +for changes. When changes are made to the combox directory, the combox +directory monitor is responsible for correctly detecting the type of +change and doing the right thing at that instance of time. + +When a file is created in the combox directory, the combox directory +monitor will take that file, split it into \verb+N+ (equal to the +number of node directories) shards, encrypt the shards, spread the +encrypted shards to the node directories, and finally store the hash +of the file in the local combox database. + +When a file is modified in the combox directory, the combox directory +monitor will take that modified file, split it into \verb+N+ (equal to +the number of node directories) shards, encrypt the shards, spread the +encrypted shards to the node directories, and finally update the hash +of the file in the local combox database. + +When a file is deleted in the combox directory, the combox directory +monitor will remove the encrypted shards of the file in the node +directories and get rid of the file's hash from the local combox +database. + +When a file is moved/renamed in the combox directory, the combox +directory monitor will move/rename encrypted in the node directories, +the file's hash from the local combox database and store the hash of +file under its new name. + +\subsection{Node directory monitor}\label{sec:4-combox-nodirm} + +Node directory monitor is an instance of +\verb+combox.events.NodeDirMonitor+ monitoring a node directory. When +changes are made to the node directory, the node directory monitor is +responsible for correctly detecting the type of change and doing the +right thing at that instance of time. Each node directory has a +dedicated node directory monitor; if there are 2 node directories, +then combox will instantiate 2 node directory monitors. + +When an encrypted shard is created in the node directory due to a file +created on another computer, the node directory first checks if the +respective file' encrypted shard(s) has/have arrived in other node +directory/directories. If all encrypted shards have arrived, then the +node directory takes all the encrypted shards, decrypts them, +reconstructs the file and creates the file in the combox directory of +this computer and stores the hash of the newly created file in the +local combox database. If the all the encrypted shards have not +arrived, then the node directory does not do anything. It must be +observed here that the node directory monitor of the last node +directory which gets the encrypted shard will be the one to perform +the file reconstruction and creation. + +When an encrypted shard is modified in the node directory due to a +file modified on another computer, the node directory first checks if +the respective file' modified encrypted shard(s) has/have arrived in +other node directory/directories. If all modified encrypted shards +have arrived, then the node directory takes all the modified encrypted +shards, decrypts them, reconstructs the file and puts the modified +version of the file in the combox directory of this computer and +updates the file's hash in the local combox database. If the all the +modified encrypted shards have not arrived, then the node directory +does not do anything. It must be observed here that the node directory +monitor of the last node directory which gets the modified encrypted +shard will be the one to perform the file reconstruction and will place +the modified file in the combox directory. + +When an encrypted shard is deleted in the node directory due to a file +deleted on another computer, the node directory first checks if the +respective file' encrypted shard(s) has/have been deleted in other +node directory/directories. If all encrypted shards have been deleted +from the node directories, then the node directory deletes the file in +the combox directory of this computer and removes its information from +the local combox database. If all encrypted shards have not been +deleted, then the node directory does not do anything. It must be +observed here that the node directory monitor of the last node +directory in which the encrypted shard is deleted will be the one to +delete the file from the combox directory. + +When an encrypted shard is moved/renamed in the node directory due to +a file moved/renamed on another computer, the node directory first +checks if the respective file' moved/renamed encrypted shard(s) +has/have arrived in other node directory/directories. If all +moved/renamed encrypted shards have arrived, then the node directory +takes all the moved/renamed encrypted shards, decrypts them, +reconstructs the moved/renamed file and puts the moved/renamed the file +in the combox directory of this computer and stores the hash under the +file' new name in the local combox database. If the all the +moved/renamed encrypted shards have not arrived, then the node +directory does not do anything. It must be observed here that the node +directory monitor of the last node directory which gets the +moved/renamed encrypted shard will be the one to perform the file +reconstruction and will place the moved/renamed file in the combox +directory. + +\subsection{Database structure}\label{sec:4-combox-db} + +To keep it simple, stupid, I decide to maintain bare minimum +information about files, stored in the combox directory, and depend on +file system events to do the right thing when changes takes place in +the combox directory. + +The only information that is stored in the database, about a file in +the combox directory is its SHA-512 hash; The SHA-512 hash of a file +is enough information to detect in the file. In the database, there +also four dictionaries -- \verb+file_moved+, \verb+file_deleted+, +\verb+file_created+, \verb+file_modified+ -- which tracks the number +of shards of a file that was moved/deleted/created/modified due the +respective file being moved/deleted/created/modified on another +computer; these four dictionaries are primarily used by the +\verb+NodeDirMonitor+ to detect remote file +movement/deletion/creation/modification and triggering file +reconstruction from shards at the right time. + +The database is a JSON file on the disk, stored by default at +\verb+$HOME/.combox/silo.db+. The +\verb+combox.silo.ComboxSilo+\cite{combox-src:silo.ComboxSilo} is the +sole interface to read from and write to database. The database is +primarily accessed and modified by the combox directory monitor +(\verb+ComboxDirMonitor+) and the node directory monitor +(\verb+NodeDirMonitor+) through a shared Lock\cite{py:threading.Lock} +that ensures that only one entity\footnote{An entity can be the combox + directory monitor or one of the node directory monitors} can +access/modify the database at a time. + +Below is an illustration of the structure of the combox database: + +\begin{verbatim} +{ + "/home/rsd/combox/ipsum.txt": "e3206df2bb2b3091103ab9d...", + "/home/rsd/combox/tk-shot-osx.png": "7fcf1b44c15dd95e0...", + "/home/rsd/combox/thgttg-21st.png": "0040eedfc3eeab546...", + "/home/rsd/combox/lorem.txt": "5851dd7a4870ff165facb71...", + "/home/rsd/combox/the-red-star.jpg": "4b818126d882e552...", + "file_moved": {}, + "file_deleted": {}, + "file_created": {}, + "file_modified": {}, +} +\end{verbatim} + +The \verb+combox.silo.ComboxSilo+, which is the sole interface to read +from and write to the database, uses the pickleDB +library\cite{pylib:pickledb}. The pickleDB is a very basic key-value +store which allows one to store information in the JSON format; if I +would have not found this library or if this library was never by +Harrison Erd, I've would have written something very similar to this +library as part of combox to realize the basic key-value storage that +is needed to track the hashes of the files stored in the combox +directory. + +It must be noted that the combox database stored on each computer is +independent and does not communicate or make transactions with the +combox databases located in other computers. + +\section{combox modules overview} + +combox is spread into modules that have functions and/or classes. As +of \verb+2016-02-04+ combox is considerably a small program: + +\begin{verbatim} +$ wc -l combox/*.py + 144 combox/cbox.py + 178 combox/config.py + 241 combox/crypto.py + 891 combox/events.py + 541 combox/file.py + 454 combox/gui.py + 0 combox/__init__.py + 71 combox/log.py + 278 combox/silo.py + 29 combox/_version.py + 2827 total +\end{verbatim} + +This section gives an overview of each of the combox modules with +extreme brevity: + +\begin{description} +\item[combox.cbox] This module contains \verb+run_cb+ function runs + combox; it creates an instance \verb+threading.Lock+ for database + access and a shared \verb+threading.Lock+ for the + \verb+combox.events.ComboxDirMonitor+ and + \verb+combox.events.NodeDirMonitor+; it initializes an instance + \verb+combox.events.ComboxDirMonitor+ that monitors the combox + directory and an instance of \verb+combox.events.NodeDirMonitor+ for + each node directory for monitoring the node directories. This + modules also houses the \verb+main+ function that parses commandline + arguments, starts combox configuration if needed or loads the combox + configuration file to start running combox. +\item[combox.config] Accomodates two import functions -- + \verb+config_cb+ and \verb+get_nodedirs+. The \verb+config_cb+ is + the combox configuration function that allows the user to configure + combox; this function was designed in a such way that it was + possible to use for both CLI and GUI methods of configuring + combox. The \verb+get_nodedirs+ function returns, as a list, the + paths of the node directories; this function use used in numerous + places in other combox modules. +\item[combox.crypto] This has functions for encrypting and decrypting + data; encrypting and decrypting shards (\verb+encrypt_shards+ and + \verb+decrypt_shards+); a function for splitting a file into shards, + encrypting those shards and spreading them across node directories + (\verb+split_and_encrypt+); a function for decrypting the shards + from the node directories, reconstructing the file from the + decrypted shards and put the file back to the combox directory + (\verb+decrypt_and_glue+). Functions \verb+split_and_encrypt+ and + \verb+decrypt_and_glue+ are the two functions that that are + extensively used by the \verb+combox.events+ module; all other + functions in this module are pretty much helper functions are + \verb+split_and_encrypt+ and \verb+decrypt_and_glue+ functions and + are not used by other modules. +\item[combox.events] This module took the most time to write and test + and it is the most complex module in combox at the time of writing + this report. It contains just two classes -- \verb+ComboxDirMonitor+ + and \verb+NodeDirMonitor+. The \verb+ComboxDirMonitor+ inherits the + \verb+watchdog.events.LoggingEventHandler+ and is responsible for + monitoring for changes in the combox directory and doing the right + thing when change happens in the combox directory. The + \verb+NodeDirMonitor+ also inherits + \verb+watchdog.events.LoggingEventHandler+ and similarly responsible + for monitoring a node directory and doing the right thing when a + change happens in the node directory; subjectively, + \verb+NodeDirMonitor+ is slightly more complex than the + \verb+ComboxDirMonitor+. +\item[combox.file] This is the second largest module in combox. It + contains utility functions for reading, writing, moving + files/directiores, hashing files, splitting a file into shards, glue + shards into a file, manipulating directories inside combox and node + directories. +\item[combox.gui] Contains the \verb+ComboxConfigDialog+ class; it is + the graphical interface for configuring combox. The class uses the + Tkinter library\cite{pylib:tkinter} for spawing graphical + elements. Other graphical libraries include PyQt\cite{pylib:qt} + were considered Tkinter was chosen over others because it works on + all Unix systems and Microsoft's Windows and it is part of the core + python (version 3). +\item[combox.log] All the messages to \verb+stdout+ and \verb+stderr+ + are sent through the functions \verb+log_i+ and \verb+log_e+ + functions defined in this module. +\item[combox.silo] Contains the \verb+ComboxSilo+ class which is the + canonical interface for combox for managing information about the + files in the combox directory. Internally, the \verb+ComboxSilo+ + class uses the pickleDB library\cite{pylib:pickledb}. +\item[combox.\_version] This is \emph{private} module that contains + variables that contain the value of the present version and release + of combox. The \verb+get_version+ function in this module returns + the full version number; this function used by \verb+setup.py+. +\end{description} + +\section{Language choice} + +Back in October of 2014, I was learning to write in Python and when I +had to start working on combox, I chose to write combox in Python. In +my first commit to the combox repository, I had to say this about +Python: + +\begin{verbatim} +commit 2def977472b2e77ee88c9177f2d03f12b0263eb0 +Author: rsiddharth +Date: Wed Oct 29 23:24:58 2014 -0400 + + Initial commit: File splitter & File gluer done. + + ... + + I like to write python FWIW. But after reading a dialect of Lisp when + I come back to python, it does not look very beautiful. I guess I'm + pretty convinced that there is no language that can ape the beauty of + Lisp. +\end{verbatim} + +If I were to write that commit message today (\verb+2016-02-04+), I +would've phrased my reflections about Python differently. While I've +not found a language that is as intrinsically beautiful as Lisp, I +think it is not quite right to compare Lisp and Python. Python is a +very readable language and it tends to be very accessible to +beginners. Also, it is hard to write unreadable Python code. + +\section{DRY} + +The core functionality of combox is to split, encrypt file shards, +spread them across node directories (Google Drive and Dropbox) and +decrypt, glue shards and put them back to the combox directory when a +file is created/modified/deleted/moved in another computer. The plan +was to use external libraries to accomplish things that fell outside +the realm of what I consider the ``core functionality of combox''; the +main reason behind this decision was to duly be an indolent programmer +and not indulge in trying to solve problems that others have already +solved. + +The \verb+watchdog+\cite{pylib:watchdog} library was chosen for file +monitoring; this library is compatible with Unix systems and +Windows. The \verb+pycrypto+ library\cite{pylib:pycrypto} was used for +encrypting data; combox uses AES encryption scheme to encrypt file +shards. The \verb+pickleDB+\cite{pylib:pickledb} library was used to +store information about files in the combox directory; this library is +not very clean, but, it was what I exactly looking for, if there was +no \verb+pickleDB+, I would've most probably written something similar +to it and made it as part of combox. + +Looking back, the decision to use external libraries reduced the +complexity of combox, reduced the time to complete the initial working +version of combox and made it possible to spend more than 3 months +just testing and fixing issues in combox. + +\section{Operating system compatibility}\label{4-os-compat} + +combox was developed on a GNU/Linux machine, a conscious effort was +made to write in an operating system independent way. The top criteria +for choosing a library to use in combox was that it had to be +compatible on \emph{all} of the three major computing platforms in +2014-2016\footnote{GNU/Linux, OS X and, Windows}. + +As we were nearing the \verb+0.1.0+ release, combox was tested on OS X +(See chapter \ref{ch:5}) and OS X specific issues that were found +eventually were eventually fixed. The initial \verb+0.1.0+ release was +compatible with GNU/Linux and OS X. + +After the initial release of combox, we wanted to see if combox would +be compatible with Windows. We found that: + +\begin{itemize} +\item Setting up the parapharnalia to run combox was + non-trivial\cite{doc:combox-setup-windoze}. +\item The unit tests for the \verb+combox.file+ module royally failed. +\end{itemize} + +At the time of writing the report, combox is in version \verb+0.2.2+ +and it still not compatible with Windows. Comprehensive documentation +of setting up the development environment for combox on Windows was +written\cite{doc:combox-setup-windoze} to make it less cumbersome for +anyone who would want to work on making combox compatible with +Windows. + +\section{combox as a python package}\label{4-pypi} + +Before version \verb+0.2.0+, the canonical way to install combox was +to pull the source from the \verb+git+ repository with: + +\begin{verbatim} + git clone git://ricketyspace.net/combox.git +\end{verbatim} + +Then, do: + +\begin{verbatim} + cd combox +\end{verbatim} + +Finally install combox with: + +\begin{verbatim} + python setup.py install +\end{verbatim} + +Yes, installing combox on a machine was indeed non-trivial. + +Python has a package registry called CheeseShop\footnote{code name for + Python Package Index, see https://wiki.python.org/moin/CheeseShop}; +all packages registered at the CheeseShop can be installed using +\verb+pip+ -- Python's platform independent package managment +system\cite{py:pip} -- with: + +\begin{verbatim} + pip install packagename +\end{verbatim} + +To make it easier for (python) users to install combox on their +machine, an effort was made to make it a python +package\cite{py:package-guide}. From version \verb+0.2.0+, combox has +been registered python package at the CheeseShop. (Python) users can +now easily get a copy of combox on their machine with: + +\begin{verbatim} + pip install combox +\end{verbatim} + +All versions of combox that is available through the CheeseShop are +digitally signed using the following GPG key: + +\begin{verbatim} +pub 4096R/00B252AF 2014-09-08 [expires: 2017-09-07] + Key fingerprint = C174 1162 CEED 5FE8 9954 A4B9 9DF9 7838 00B2 52AF +uid Siddharth Ravikumar (sravik) +sub 4096R/09CECEDB 2014-09-08 [expires: 2017-09-07] +\end{verbatim} + +All versions of combox's source are also available as a compressed +\verb+TAR+ ball and as a \verb+ZIP+ archive; they can be downloaded +from \url{https://ricketyspace.net/combox/releases.html}. + +\section{With the benefit of hindsight}\label{4-hindsight} + +combox's node monitor (\verb+combox.events.NodeDirMonitor+) was +written with the assumption that the node monitor will be the only +entity that will be making changes to the node directory that it is +monitoring. When started testing combox with node clients (Dropbox +client and Google Drive client), we observed that the node clients +made changes to the node directory when a file was +created/modified/renamed/deleted; for instance, when a shard, in the +Dropbox node directory, was modified on a remote computer, the Dropbox +client would first pull the newer version of the shard under the +\verb+.dropbox.cache+ directory as a temprorary file, move the older +version of the shard under \verb+.dropbox.cache+ as a backup, and +finally move the latest version of the shard, stored as a temprorary +file under the \verb+.dropbox.cache+ directory, to the respective +location in the Dropbox node directory; when a shard, in the Google +Drive node directory, was remotely modified on a remote computer, the +Google Drive client would delete the older version of the shard from +the Google Drive node directory and then create the newer version of +the shard in the respective location under the Google Drive node +directory. Since combox did not know about the node client's +behaviour, it confused combox and broke it royally; we had to make +major changes to the \verb+combox.eventns.NodeDirMonitor+ class to +make combox aware of the node client's behavior, this eventually +brutally obliterated the simplicity of the +\verb+combox.eventns.NodeDirMonitor+ class which I was proud of. + +I'm not sure how I would have written the \verb+combox.events+ module +if I had known about the Dropbox and Google Drive client's behaviour +before writing the \verb+combox.events.NodeDirMonitor+ or the +\verb+combox.events.ComboxDirMonitor+ classes. Looking back, if there one +thing I would want to re-think/redo, it is the \verb+combox.events+ +module. + +The most important lesson I'm taking away from the experience of +writing combox is the insight of how easy it is to ruthless crush the +simplicity of a program due to unforeseen use cases. + +\verb+<3+ -- cgit v1.2.3