Reading Data Analysis With Open Source Tools



Data Analysis with Open Source Tools
Data Analysis with Open Source Tools

Recently I have been reading the book Data Analysis with Open Source Tools, by Philipp K. Janert. This is a book for anyone that is working with Data. Partly because of the big data buzz, partly is the work I do in Sinar Project. But this is not a book on big data, just data analysis in general.

Unlike other book that deals with data, this don’t cover specific tools, rather use tools as a example, and this books covers a few. But this book focuses on experiences and methods of doing data analysis.

The book have 18 chapter, covers a variety of topic, from statistics, to simulation, clustering etc. Each with enough theory to understand, enough insights coming from the authors experience, enough hands on from demo for us to pick up from there.

While this book don’t covers tools comprehensively, but the method, insights in this book make it worth keeping as a reference for doing data analysis.

Adventure in Overriding Python Str

Recently I have given a project where we are overriding the python str object, I rather not say the reason because it is job related. But the process of it is pretty general, and is pretty interesting.

So here is an example.

A few thing to take note here.

  1. Notice that there is a __new__ method? In python object lifecycle, the first thing to be called is actually __new__ not __init__. __new__ will choose whether whether to replace __init__ or use a new one. In this case, I set the current object cls to str, with value bleeped
  2. Now if object is not setup in __new__ yet, how do object access the bleep method? Introducing, classmethod. Classmethod is a method the bind to a class, not an object. This is useful if you want to do call a function before a class is initiated.
  3. Classmethod binds to the child class, not the parent class it inherited from, and you need the first parameter to be a class. It can be set with a @classmethod on a method
  4. Now because we also want the string to be appended to BleepedString object, to also be bleeped, so we need to override __add__, so when string appended via + operator, will also be bleeped. Interestingly __iadd__ is not inherited, so I need to add __iadd__ method too, so that I can do +=
  5. Notice that inside __add__ I use bleep method like a normal method. That is the reason is set a @classmethod decorator in bleep method, I need the method to be available without need to be instantiated, while able to use it like a normal method in a object.
  6. Actually just override __add__ and __iadd__ is not enough to override the str behavior. Because the parent class str uses __repr__ to provide value to be used in a + and += operation, which is why I override __repr__ method, by replacing the return value to bleeped string.
  7. Then we need to override __str__ so that it return a bleeped string.

This is just a modified example what I used in my work, and have a very different and useful function. But it capture the same idea, overriding the default string behavior and return value.