Category Archives: linux

IPC with named pipe

Linux user would know what is a pipe, and named “|” as the pipe character. Named pipe is actually similar, the difference is, a linux pipe persist as long as the process that uses the pipe last. But a named pipe persist even if the process have closed.

Why you need a named pipe? It is actually a nice way to provide IPC to 2 processes. The coolest thing of all, It is shares file semantics, i.e you can use open() and close()

Here is how you create and use a named pipe

Of course you can replace cat and echo with your own process. One way this is really useful is by opening the pipe in your program.

This is how you write a writer

As you can see it shares semantic with file object, with open(), read(), write(), but with caveat.

  • open() can block if it is called on one side
  • read() can block if close() is not called on the writer

Here you are a simple way to do IPC on linux using file semantic.




Copy and Paste on GNU Screen

So recently I finally found out how to do copy and paste on gnu screen, previously I just do a drag on a mouse.

So copy and paste on gnu screen,
– ctrl-a [ 
– move to the first character of the string to copy, press enter
– move to the last character of the string to copy, press enter
– now to paste, press ctrl-a ]
p.s kina ashamed that it take me so long to figure out =.=

Converting PDF to Text with pdftohtml

Previously I have tried to extract pdf information by converting PDF to text, as described here.

Problem is,  a big wall of text is very hard to process.
Here come pdftohtml it is part of the poppler package on linux. But gnuwin do not have it for windows. Which is one reason I use pdftotext.

pdftohtml convert pdf to html. simple usage is

pdftohtml yourpdffile.pdf

You will get your html file. But it is a bit plain as they just extract text from it. It there is image inside pdf, or you pdf is pretty complicated, like Malaysian Hansard. You can use the -c

pdftohtml -c yourpdffile.pdf

Here is the catch, it will generate 1 html per page in the pdf, with images. But the layout is maintained. For document like Malaysian Hansard, it would be hundreds of page.

Then there is way to produce xml

pdftohtml -xml yourpdffile.pdf

You will get an xml file which the position information.

p.s I’m using this for Whether there will be result today. 



Converting PDF to Text

So I have recently involved with a project to extract data from PDF. Which is actually evil, but that is not important now.

On linux there is a set of utilities comes with xpdf program. It should be part of the default package installation, if not, you just apt-get or yum it.

On windows you can go to the gunwin32 page, I just download the zip just so i would not have to remove it with a uninstaller.

I don’t really need the layout information, on it. so I just use pdftotext.

On windows

program_location/pdftotext.exe -layout pdf_file.pdf

On linux, just

pdftotext -layout pdf_file.pdf

The -layout would maintain the layout of the text as from the pdf. Otherwise, the positioning for certain text will be inconsistent.


Accessing Server from Android

Recently I help maintain some server, sometime I tend to move around. So I decide to make my phone to be useful.

Android actually have a couple of app that is useful to remotely access a machine. Some of them is free.
For connecting to SSH, I found that connectbot works extremely well. It only does ssh and telnet, and thats about it. It is pretty straight forward to use. For accessing windows server, I use 2x client. Which again another another straight forward RDP client. Both connectbox and 2x client is free, and that is awesome.

The only issue on using android phone to access a server remotely is. I have a desire hd. While the screen is pretty large for a phone, typing command via ssh or, navigate around a windows server via RDP can be still a pain. It is still smaller that most desktop screen. And I don’t have a full size keyboard on the phone. Which is another pain especially I access linux server most of the time.

So it can be a pain to use at time. But for quick fix or checking on server. This work pretty well.

I attached some links for the app below

Many Ways To Grep File Content

So not too long ago I have posted on twitter

This spin to a few other way to do grep.
A few have suggested on IRC and facebook, the i parameter is to make keyword not case sensitive.

grep -iR keyword directory 

Another suggestion on IRC. 

grep -iR –exclude=file-to-ignore keyword directory

Another tweet i have receive is,
Then the last one I discovered on google is ack-grep

ack-grep keyword directory

and again, -i make case insensitive search.

ack-grep -i keyword directory

ack-grep output is nicer, and automatically ignore binary. It is slightly different than grep. But both get the job done., to me anyway

Diff’ing and Patching

So I got like a few folders with different version source code, erm text files which I am comparing in my work. Either way, thanks to unix tools like diff and patch, things is some what easier. Though it is better to use version control, which I stupid enough not to

to compare files with diff, it is just a matter of

diff original new

then my team is creating the file on windows, which have \r\n, and me on linux, which \n. So to avoid this on diff

diff -w original new

this should able to ignore difference in white space character such as \r\n, which may not be a wise choice for python, because of the indentation sensitivity, which I trust my team sane enough not to mix tab and space(yes, for non-python progammer you can all laugh)

Since I have a directory, so i would run

diff -rw original-folder new-folder

pipe it to less just for making it readable.

If i’m lucky, it is just adding stuff to the text file, i would generate a patch, for that file by

diff -u original new > original.patch

and applying it with

patch original < original.patch 

in the folder. which i should really use -p1 option in the patch command, but there is a few folder have the same name, so i just put it in that folder and run it without -p1

of course sometime thing is not as easy. so it is nice to use to see all the difference with highlighting. which here i use vimdiff. which i use

vimdiff original new

Then I realize that doing all the diff and patch can be a pain in the back. So, a better use the code with version control, what ever reason, or how rush it is.

Fun with Linux: Taque

So i got a remember the milk account. What’s cool is, it is a nice web based, todo list. What really sucks is, there is no desktop client on linux. Until now.

Tasque is a todo list manager, it have a local file backend, and most importantly, it can use remember the milk, as backend.

It is pretty easy to use, just enter task, and it is done. One interesting thing is, you can enter “do something today” it will automatically, fill in the date for today.

What really sucks is, it don’t have notification. But it is useful enough, to use.

To install on ubuntu,
sudo apt-get install tasque

But i can’t seems to get it work, with fedora. Enlighten me please

Is loving screen

So I got a job, which allows me to develop on linux…(w00t!!!).
Along the way, I have found screen….

screen is a full screen windows manager that, multiplexes several process. Usually active shells. As quoted from the man page.

The thing is, it is useful that, I can run several shells. Without actually have to open several instance of gnome terminal. Which is useful, because I use a lot of vim nowadays. And really have to open several windows just to have several terminal tend to be distracting.

screen is not hard to use, 
Ctrl-a + c        : to open new windows
Ctrl-a + n        : switch to next windows
Ctrl-a + p        : switch to prev windows
Ctrl-a + Ctrl-a: toggle between 2 windows
Ctrl-a + ?        : is the help(if you don’t remember the rest, just remember this one)

BTW I found that this is useful for my eeepc as well. Consider that, I found that console apps tend to save a lot of screen space…And too much windows open, then to be hard to navigate.

Learned how to use rsync

So I need to sync a folder between 2 computer my laptop, and PC.
So one way is to use scp, but I don’t quite want to replace the folder. Just files I created. I just want to make sure the folder is in sync.

So I play with rsync. Which track changes of the files, and transfer the changed file only. From what I understand from wikipedia anyway.

And what so cool.The command is similar to scp, or cp. So the command for rsync is:

rsync source dest #for files
rsync [email protected]:source-file dest #from remote host to local file
rsync src [email protected]:dest-file #from local file to remote host
rsync -r src dest #this is to transfer folder

So just change source with files, hostname with hostname or ip.

So another tools I learned as a newbie……