Posts

Linear Regression with Python

Image
Introduction Our first insight into machine learning will be through the simplest model - linear regression. The goal in regression problems is to predict the value of a continuous response variable. First we'll examine linear regression, which models the relationship between a response variable and one explanatory variable. Next, we will discuss polynomial regression and regularization methods. Simple model Linear regression tries to minimize the residual sum of squares between the observed responses in the dataset, and the responses predicted by the linear approximation. Mathematically it solves a problem of the form: We'll demonstrate the process using the toy diabetes dataset, included in scikit-learn. For more details about the loading process, take a look at the previous article about loading datasets in Python . import matplotlib.pyplot as plt import numpy as np from sklearn import datasets, linear_model from sklearn.cross_validation import train_test_split...

Loading datasets using Python

Image
Introduction Before we proceed with either kind of machine learning problem, we need to get the data on which we'll operate. We can of course generate data by hand, but this course of action won't get us far as is too tedious and lacks the diversity we may require. The are numerous sources of real data we can use and if none of it satisfies ones needs, there are some popular artificial generators, creating datasets according to preset parameters. scikit-learn provides a plenty of methods to load and fetch popular datasets as well as generate artificial data. All these can be found in sklearn.datasets package. Toy Datasets The scikit-learn embeds some small toy datasets, which provide data scientists a playground to experiment a new algorithm and evaluate the correctness of their code before applying it to a real world sized data. Let's load and render one of the most common datasets - iris dataset import numpy as np import matplotlib.pyplot as plt from sklea...

Python for Data Scientists - scikit-learn

Image
Introduction In the previous posts we've covered the basics of data analysis. Now it's gloves off and here come the big guns - machine learning library called scikit-learn. scikit-learn has become one of the most popular open source machine learning libraries for Python. It provides algorithms for machine learning tasks including classification, regression, dimensionality reduction, clustering and many more. It also provides modules for extracting features, processing data and evaluating models. Installation scikit-learn is dependent upon both NumPy and SciPy, of which we've talked. So make sure to upgrade both to latest version prior to installing the package, which is done, of course, using the python package manager. pip install scikit-learn Conclusion scikit-learn covers a very broad spectrum of data science fields, each deserving a dedicated discussion. And this is exactly what we're going to do for the next couple of sessions, diving deeper into each...

What are Bitcoin and cryptocurrencies?

To understand cryptocurrency, it’s best to start with the most popular and in many ways the simplest of these networks: Bitcoin Bitcoin is the world’s first completely decentralized, open-source, and peer-to-peer digital currency. A short decade ago, knowledge of it was confined to a handful of hobbyists on Internet forums. Today, the bitcoin economy is larger than the economies of some of the world’s smaller nations. The value of a bitcoin (or BTC) has grown and fluctuated greatly, pennies to many thousands of dollars. No Third Parties Until Bitcoin’s invention in 2008 by the unidentified programmer known as Satoshi Nakamoto, online transactions always required a trusted third-party intermediary. For example, if Alice wanted to send $100 to Bob over the Internet, she would have had to rely on a third-party service like PayPal or MasterCard. Intermediaries like PayPal keep a ledger of account holders’ balances. When Alice sends Bob $100, PayPal deducts the amount from her account ...

Python for Data Scientists - Matplotlib

Image
Introduction Sure, with both pandas and SciPy you can perform some superb data analysis. And with the IPython , working sure became much easier. But how about presenting your results? Today we'll talk about Matplotlib - our presentation package. Making plots and static or interactive visualizations is one of the most important tasks in data analysis. It may be a part of the exploratory process; for example, helping identify outliers, needed data transformations, or coming up with ideas for models. Installation Installation of matplotlib is easy. If don't have it preinstalled as part of your Python distribution, just do it manually using python package manager pip install matplotlib Usage Since we're already familiar with IPython, I'll be only covering it's usage as this is a preferable way of writing data analysis procedures. In console mode graphs are plotted in a separate newly created window, each time you render a plot. In web mode, it's b...

JavaScript Continuous Integration with TravisCI

Image
Last time we talked about automating JavaScript testing with Grunt.js , and even though we quite exhausted the topic, there is one thing left. The provided solution worked well for a solo developer or maybe a small team, however imagine you work with dozen developers, where everyone pushes one's commits constantly. Forcing all of them to follow a procedure of running automated script upon each commit, will be no trifle. Continuous integration comes to rescue. What it does is running predefined build scripts, in our case Grunt.js, on each predefined event - usually on each push. TravisCI As usual, we'll start a new topic with the easiest implementation to get you started with the technology. Once you master the basics, we'll continue with more advanced tools in the next article. Today we'll talk about TravisCI  and create continuous integration for our last article code and only focus on needed changes. I've copied the code into new Git repository . ...

EcmaScript6 with TypeScript and Grunt.js

Image
ECMAScript 6 is nearly here . In fact I can already taste it and so will you with TypeScript . TypeScript is an open source language and compiler written by Microsoft running on NodeJS. The language is based on the evolving ES6 spec but adds support for types, interfaces that generates JavaScript (ES3 or ES5 dialects based on flag). In fact it's very interesting shift for Microsoft to make something useful for open source community, so before you boo me, have a look at it as it's not so bad. Introduction Microsoft has compiled a great video introducing the TypeScript and since one video replaces million words, let's start with it. Using TypeScript As TypeScript is built on top of Node.js, installing it will be as easy as breathing. npm install -g typescript And compiling the files is done through the tsc command, however this is no way respectable developers work. We'll be using Grunt.js to compile our TypeScript files during the build phase. Let...