Monday, August 13, 2007

Why learn Regular Expressions

Scenareo:A friend has been given the task by his office, to find the no of eligible voters for each of the islands. The data is in pdf files, with different files having different structures. There are headings, numbers, text in between lines etc. He had an idea of importing the files to excel and calculating the no of rows.But the headings and notes in between lines meant that there's a lot of deleting to do..Imagine doing that for an Island like Hithadhoo with over 9000 records. Instead of wasting time doing redundant work we can always make use of regular expressions.
Heres a quick script to do that.

for i in *
pdftotext -layout -nopgbrk $i out
sed -r -f sedfile out > ${i}out


/^[ ]{7}[A-Za-z]+$/d
/^[ ]+$/d

