Category Archives: python

Ping Pong with gevent, socket-io and flask

So recently one of my project allow me to have experience on gevent-socketio. While the code is open source, it can be complicated. Thus the example I show here, while not serious hopefully will make things easier to see.

You will need these dependency,

Before we can start to create a view for socket-io you will need to define the namespace to be used

The gevent-socketio library uses Namespace as the view logic, all the logics will be in a namespace, along with namespace, there is also Mixins that is built on top of namespace that provide extra functionality, gevent-socketio provide BroadcastMixin and RoomMixin, which is useful.

line 3: We create our namespace by subclassing Namespace Class

line 4: This is method defined by us. We prefix on_ to an event that we want to process. In this case it is the ping event. The event will be send from socketio client. The parameter attack, that is essentially the payload that we will be receiving, and we don’t need to do any conversion from json to dict, as the library handle it for us.

line 7 and 9: Self.emit will emit an event to the socketio client, again the dict, is the payload that will be sent, no converstion to json necessary.

Now we can finally define a view

This is the view for socketio. app is just your standard flask app.

line 7: is the route necessary for the socketio client to connect

line 9: this is the same across socketio app, the difference here is the namespace ‘/pingpong’ and the namespace PingPong, that you will need to define yourself.

Now to serve this guy

line 6: this will be the line that serve the app. It will serve all the views include the non-socketio one.

I will skip the view to render the main page and the html, I will focus on the javascript alone.

line 2: This is how you define a socket io.connect(‘/namespace’)

line 10: We bind a javascript click an event, to send an event we use socket.emit(event,data). data will be a javascript object, event will be a string, notice the on_ping method on the Namespace we define on top.

line 3: As the socketio server emit a pong event, this will event handler and do something with it in callback. Again don’t worry about conversion, the data is converted into a javascript object.

Hopefully this will make things slightly clearer. Btw the example on github.


Python Malaysia Meetup Postmortem

A.K.A How to run a geek event in Malaysia

After the yesterday’s Python Malaysia meetup, here is a few thing I want to try up, or keep on using for future event

  • Malaysian tend to be late, so always put the time a bit earlier, at least 30 minute ahead, that is the usual time for Malaysian to be late
  • Half of the user do not attend, even though they have registered on eventbrite. What I want to try next time is, try charge money for the next event. The money will goes to pizza
  • The event is a bit bland, some suggest a full day just for Python User Group Meetup. I try to run it like most other python user group, one topic per meetup. Bigger event also harder to run. What I might test out, is having a lightning talk for each meetup.
  • We have our networking session in a mamak, we ends up using 2 rows of table. It is not a bad thing, though table kinda limit movement, maybe next time, we order pizza(Go to point 2). It is nicer for everyone have a chance to talk to each other
  • Python Malaysia need a proper website, not everyone using facebook. Even though most go from facebook event page to eventbrite page. Still it is a nice thing to have.
  • Location matters!!!!! ITrain is just the right place to have an event, in the middle of city, accessible via LRT. Car park can be a problem though. It helps to bring more people in, because of the location.

Btw, there will be another meetup, but that is next month.

Bills Watcher Malaysia

Recently I got involved in a Open Data Movement in Malaysia, and one of my recent project is called Bill Watcher. It is a webapp that broadcast via twitter and rss on bills that is being debated, and being passed recently.

Main Page
Bills Detail

Basically this page scraped from the malaysian parliament website. And load into a sqlite database now, which I don’t really care, because I am doing it via sqlalchemy for now, it make it easy to move to other database. just read from the database via sqlalchemy, and render it. Use 960 gs to make the rendered page look nice. 

The feature of this app, is pretty small, the pdf is iframe, there is no login. Fancy sharing feature is via twitter and facebook button and RSS. Commenting will be provided by disqus, if I figure out where to put it. Javascript is only used on twitter and facebook button. 

I consider this as MVP for this, small basic feature to be extended. So feature will be added as requested, but not all will be added. Also not a lot of information is available on the parliment bills page, so feature will be based on effort needed to extract it from other source, which actually not really easy. But otherwise, we will try our best to get feature to be added inside. 

What next, we going to host it live soon. Then we will add disqus, then finalized twitter notification. To get your hand dirty now. Go to the github link 
I will transfer to the sinar repo soon. Need to do a bit update across repo. 
Recently a bill being debated intensely, shows that how many stuff we don’t know about the decision process in the country, even though it is there on the parliament side. Which does not make it easy for use nor navigater around. 

Adventure in Bottle( the web framework)

So I have been scraping data online for sometime. While scraperwiki have an API that allow third party app to get data in json/xml form. I think I can make it easier, because scraperwiki query involve doing a sql query on the sqlite datastore. Thus I take the opportunity to learn new python web framework.

The framework only need to handle request, and spit data in json(maybe xml later). It does not need a template, it is json. It don’t need an ORM, the data most probably scrape from somewhere else. It do not need session, it is meant to be use by library. The data is open anyway. 
The first framework I try out is Bottle
The first thing I notice is the amount of setup that I have to do, coming from a django background. Which is well known for the big file. The amount of setup is small. Just install using ‘pip install bottle’.
Essentially just an application defined, with the object Bottle()
And pass to the run function. 
By default bottle already have a default application, so you don’t strictly need it, I just to put it there to show that it is there.
Another thing I have noticed is, there is no url route in a separate file. A route decorator is added to a function that I want to serve in the web app. The route is part of the application(the Bottle() object), and I can limit the type of request I can do on it, like POST/GET. I found that this approach is pretty clean, it reduces the boiler plate like in django views. 
Another thing to notice is. I do not specify a response method/object(like django). That is another nice thing about bottle. If the function returns a dict, the response will be in json. If string then the mimetype is text, etc. There is no need to specify a function for response.

Finally to run the app, just run python (or any python file with the bottle run function). You have an webapp. 

For this project I didn’t test the template, but from the doc, it is specified with a view decorator, which I think is nice, but I don’t need it now. From the doc, I found that it is pretty clean.

Because bottle is a micro framework, there is no script like django, no ORM, I uses sqlalchemy here. There is no session support too. But interestingly I don’t feel that I missed anything. In fact, it is pretty pleasant to use. Though session will definitely bite me if I ever have to implement login, but solution is on the documentation.

Overall, it is a fun framework to use, even though this is a small project. The documentation is pretty good. I might use it for future project.

Using Python Function with sqlite

Note: You can find the docs in the python doc page

This is more of a experience. Not too long ago, I have scrape from the parliament website on profiles of Member’s of Parliament, you can find the result here.

The thing is, as I use the data from the sqlite database, I download from the site, I realized that, the Title is part of the name of the MP’s. So one would get “XXX , Y.B Tuan”. Y.B Tuan is the title.

That would make query like ‘select Parti from swdata where Nama=name’ hard. Because this is precisely what I am looking at, for another project.

On the other hand, sqlite3 module, apart comes with python standard library since 2.6. Actually have a function called, Connection.create_function.

So I wrote a little function called get_name, and the example show how it works.

import sqlite3 

def get_name(name):
    return name.split(‘,’)[0]

s = sqlite3.connect(‘dbname’)
# attach the python function
# and use it
result = s.execute(‘select get_name(Nama) from swdata’)

Just define a python function,  make sure it return datatype that is compatible with sqlite, attach it with create_function. Now you can use it in your sqlite query in python

Hope this is useful for someone. CHEERS

A little plug, this is something we try to work on in this little group call Sinar Project, and this is still in an early stage

A scraper running on the cloud

I have been writing scraper for sometime, as you can see in some of my old post here.

So recently thanks to Kaeru, introduced to me, scraperwiki. This is basically a service for you to run scraper on the cloud, with additional benefits:

  • It runs on the cloud
  • It provide infrastructure to store the data, in form of sqlite database, which you can download.
  • It provide easy way to dump data as excel
  • It provide infrastructure to convert the data into API 
  • Somebody can fork the scraper and do enhancement on it. 
  • A web based IDE, so you just write your scraper on it. 
  • Everybody can see the code of the public scraper. 
  • Scheduled task
One very cool thing about scraper wiki is, it support a set of third large library that can be used. It support Ruby, PHP, as well as Python. The API for scraper wiki is pretty extensive, it both covers it’s own scraper, geocoding function, views for the data hosted on scraper wiki etc. 
My only concern is, let say I want bring my scraper out of the service, I will need to rewrite the saving function. But on the the data can be downloaded anyway, and I use python, so it is not that big of a deal. 
Below is a scraper that I have written, on scraper wiki. While it is mostly a work in progress, it show how it would look like. 

Python Dateutil Redux

Not too long ago, I covered one use of python dateutil, on the blog here.

The library itself is pretty nifty in other case as well. In this case date difference. While python datetime module in the standard library, the datetime.timedelta is used to find difference in date, it counts up to the days. In my case, I want to count it to years.

That is where dateutil comes it. It have a module called, relativedelta. Which do actually count to years. To use it is a matter of import and use it

from dateutil.relativedelta import relativedelta
date_diff = relativedelta(date_from,date_to)
print date_diff

It as you can see does count up to years, also months. Which is useful if you wanted to find difference in date beyond just days.

Python Web Scraping

There is time where there is information in govt website of is very useful, but unfortunately the data is in form of website, it could be worst as it can be in PDF. So it can be a pain if we wanted to use information for programming, but there is no API.

On the other python is a pretty powerful language. It comes with many library, include those that can be use to do HTTP request. Introducing urllib2, it is part of standard library. To use it to download data from a website can be done in 3 line of code

import urllib2 
page = urllib2.urlopen(url)

The problem, then is you get a whole set of HTML, which a bit hard to process. Then python have a few third party library, the one I use is Beautiful Soup. Beautiful Soup is nice that it is very forgiving in processing bad markup in HTML. So you don’t need to worry about bad format and focus to get things done. The library itself can also parse XML, among other thing.

To use Beautiful Soup,

from BeautifulSoup
import BeautifulSoup
page = “html goes here”soup BeautifulSoup(page)
value = soup.findAll(‘div’)
print value[0].text

But you need to get the html first don’t you?

import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)
value = soup.findAll(‘div’)
print value[0].text

To use it, just download the data using urllib2 and pass to to beautiful soup. To use it is pretty easy, to me anyway. Though, urllib2 is going to be re organized in python 3. So code need some modification.

To see how the scraper fare, here is a real world example, in github part of a bigger project. But hey it is open source. Just fork and use it, in the this link.

So enjoy go forth and extract some data, and promise to be nice, don’t hammer their server.