The term cloud computing has taken on a number of different definitions since it became a marketing phenomenon. For some users it is a very big harddrive that stores their documents and email. For others it is an unlimited supply of consumer-grade computers they can create and destroy at will. And for others it is a virtual storefront that handles puchases, sales, and other transactions.
This document provides an overview of the development of RCloud, a Platform as a Service model. The goal of this project is threefold. First, it will allow data scientists to quickly develop and deploy analyses over a cluster of machines. Second, it will allow these analyses to be disseminated in the form of visualizations, code, documents that can be shared with other data scientists and business people alike. Third, these disseminated materials will allow executives to better understand business opportunities and make data-driven decisions.
The RCloud computing infrastructure refers to the software that is used to manage computing resources. It is designed to be scalable meaning that the run-time for large-scale computations can be kept to a minimum simply by applying more computing resources to the calculation. This will allow the RCloud to continue processing larger and larger amounts of data simply by adding hardware resources. The infrastructure is also designed to be elastic meaning that computing resources can be applied to a calculation as it is being performed providing further speed-up or removed from a calculation, allowing the computing resource to be applied to other calculations.