Posts

Showing posts from November, 2014

Python for Data Scientists - Pandas

Image
Introduction Having learnt NumPy and SciPy in previous articles, let's discuss our next package, called pandas. Pandas provides rich data structures and functions designed to make working with structured data fast, easy, and expressive. It is, as you will see, one of the critical ingredients enabling Python to be a powerful and productive data analysis environment. Pandas combines the high performance array-computing features of NumPy with the flexible data manipulation capabilities of spreadsheets and relational databases (such as SQL), mingling DataFrame - a two-dimensional tabular, column-oriented data structure with both row and column labels. It provides sophisticated indexing functionality to make it easy to reshape, slice and dice, perform aggregations, and select subsets of data. For users of the R language for statistical computing, the DataFrame name will be familiar, as the object was named after the similar R data.frame object. However the functionality