In our system each data store is isolated such that having someone’s raw user data from the database doesn’t allow you to identify or access their file meta data. Similarly, having a raw random entry from the file meta data store doesn’t let you know whose it is, or give you access to the file data for that file. Finally, having a raw piece of encrypted file data doesn’t allow to you identify whose it is or what its decrypted contents are. All three of these entities are indirectly linked and must be accessed in the proper order so that file data access can be achieved.
The relationships between these sets of keys are kept encrypted within each “previous” data store. Upon account creation, a public and private key pair is generated for the user, as well as a symmetric key used to begin traversal within their file system. The file system traversal key is encrypted with the user’s private key and then stored, and the private key is encrypted with the user’s password and then stored.
We also generate a “challenge key” using the answers to a user’s security questions in place of their password. Considering this, it is hopefully clear why we can’t recover accounts without a user’s security answers. Without either a user’s password or challenge key, we can’t access any metadata keys and therefore can’t access any metadata.
The file system traversal key is the initial key used to begin to unlock the path to any filesystem items in question, which contains what we consider to be sensitive data: filenames, the keys to the actual file data, etc. Importantly, this means that while many users may share a single deduplicated piece of file data, each has their own unique path to that data and their own metadata key to begin file system traversal.
The management of the indirect relationship between these three core components of our architecture is very important to us; we want to be as secure as possible in order to establish trust for our users while still providing powerful features within our privacy model. Innovating on what it means to have storage in the cloud is what we’ve based our past, present, and future on, and we want to be able to continue to do so while still being able to look our customers in the eye and tell them that their data is not only secure, but also private.