We love open source software and open data, and try to use tools available to the wider community whenever possible. The data science stack underlying our platforms and analyses includes the following:
Python has become one of the most popular languages for scientific calculations and data analysis, and we love it too! We’re currently using the Anaconda Python 3 distribution, which is packaged up by the kind folks at Continuum Analytics.
Jupyter is an interactive scripting and analysis framework that makes playing with data fun. It grew out of IPython, which was born right here in Boulder, Colorado at CU, in the hands of a physics grad student named Fernando Perez.
Under the hood, NumPy and SciPy are libraries that really make Python hum for data analysis and scientific computation. They build upon decades worth of high performance computing libraries, and make that functionality much more accessible than say… Fortran77.
Not that any of us would know anything about that. No sirree.
Pandas provides easy to use heterogeneous data frames for interactive data manipulation and visualization, including very large data sets.
For flexible, static two dimensional visualizations, we use Matplotlib, which is tightly integrated with all of the tools above. We’re also excited to get more familiar with Plotly, for interactive online data presentation.
High level machine learning and clustering techniques have become very accessible and powerful. The sklearn suite of Python tools makes it much easier to automatically associate records in different datasets without overlapping IDs, and to extract interesting patterns from datasets that might otherwise be too large or complex to tackle.
SQLAlchemy provides a database agnostic Python programming environment, that lets us build complex data structures, which are linked directly to an underlying database through an advanced Object Relational Model.
For large local data sets, postgres is a fully featured open source database, that seeks to implement the full SQL standard, which means lots of native support for specific data types.
Can you fall in love with a text editor? Apparently the answer is yes! Does that make us dorks? Who cares!