Converting PDF to Text

So I have recently involved with a project to extract data from PDF. Which is actually evil, but that is not important now.

On linux there is a set of utilities comes with xpdf program. It should be part of the default package installation, if not, you just apt-get or yum it.

On windows you can go to the gunwin32 page, I just download the zip just so i would not have to remove it with a uninstaller.

I don’t really need the layout information, on it. so I just use pdftotext.

On windows

program_location/pdftotext.exe -layout pdf_file.pdf

On linux, just

pdftotext -layout pdf_file.pdf

The -layout would maintain the layout of the text as from the pdf. Otherwise, the positioning for certain text will be inconsistent.


Accessing Server from Android

Recently I help maintain some server, sometime I tend to move around. So I decide to make my phone to be useful.

Android actually have a couple of app that is useful to remotely access a machine. Some of them is free.
For connecting to SSH, I found that connectbot works extremely well. It only does ssh and telnet, and thats about it. It is pretty straight forward to use. For accessing windows server, I use 2x client. Which again another another straight forward RDP client. Both connectbox and 2x client is free, and that is awesome.

The only issue on using android phone to access a server remotely is. I have a desire hd. While the screen is pretty large for a phone, typing command via ssh or, navigate around a windows server via RDP can be still a pain. It is still smaller that most desktop screen. And I don’t have a full size keyboard on the phone. Which is another pain especially I access linux server most of the time.

So it can be a pain to use at time. But for quick fix or checking on server. This work pretty well.

I attached some links for the app below

Setting up a python environment on windows

One advantage of working in current company each got a laptop. But like most IT company in Malaysia, it uses windows. In the job I ssh to a linux dev server for work, for personal project in python to be done locally, and setting up python on windows is a pain.

The common way to install python on windows is

  1. download python from
  2. install
  3. setup the path
And repeat the process for setuptools, and virtualenv, it is a pain because not just have to setup the path, and have to go to a page to install it.
Another way is to use active python. Which is free, and their community license is pretty acceptable. And the best of all, it bundle with easy_install, pip, virtualenv and the path is done for us properly. You can download it here

From the darkside, scanning FAIL!!!!!!!

Need to make a photo copy of my certs for office use. Which suppose to be easy right. At least I expect it to work seamslessly on Vista. NOT QUITE!!!!!!!

First Paint do not detect the scanner, but Vista did. Then I have to rely on picasa to to get it. I mean, you rely on third party software just to scan a information. Maybe I just get used to having software that is preinstalled to able to use my hardware.

I mean, even on linux, it have gimp installed by default. And chances is it have xsane installed by default, aka it is a linux software to do scanning. And I am able to scan something, or just use gimp to acquire an image(via xsane). I mean, suddenly on windows, I have to rely on picasa which I downloaded myself, scan the image. It is really crazy.

Windows is user friendly? Really I have doubt for it. I am able to do the same task by just few clicks on linux.

p.s Now if canon just release drivers. even better open Interfaces to the scanner of mine, the quality will be really nice on linux.

from the darkside, redux: powershell part 3

So I played with powershell, again. Realize that my idea that program cannot be opened is quite wrong.
Turn out that most does work, I tested the command line program anyway. So netstat, ping etc, does work.
So let say, I want to extract some data from such a program, I use netstat here.

netstat|select-string ‘8080’

Let say I wanted to see all the 8080. It would work like linux, or unix. Except we don’t use grep, we use
select-string instead.

It just day 3, still have more to compare.

From the darkside, aka laptop died……………….

So my lappy finally died. Since I still pissed @ HP, I probably get another one, that would be around christmas.
So for now, I will be in the darkside, the darkside is more pleasant than I thought. Erm I mean, I am on windows vista now.

Except several thing happens.
1) I have to resort to xchat on windows for IRC, and vlc too. And maybe later cygwin……….
2) I tried on c#, I try to refresh my knowledge on .net. So I get visual studio express. What happens is, I cannot start a project. Because it cannot access the registry. From forums, turn out that I need to run the program as administrator. Which is crazy, never heard that I need admin access to use a development tools. Not on linux anyway.
3) Because I like to play with alternative OS. Only to find out that, virtual PC, don’t support OS other than windows. So I have to resort to virtual box.

Conclusion, eventually, I probably use more open source on windows. Than proprietary program. Because I didn’t work as I thought. Probably it just another adventure on the dark side