15 Jul 2012

My Firefox confuzzled my Nokogiri

I’ve been working on a project to import a large dataset from HTML pages where the content isn’t very clearly divided with CSS classes. The tool that normally works for me is Ruby’s Nokogiri library.

I’d had some trouble in the past where Nokogiri insists that a certain xpath doesn’t exist, while it can clearly find it through the css selector. This time instead of going with the css, I investigated to see why it wasn’t finding the ‘correct’ xpath. After playing with it in a Pry session, I figured it out. Here’s the xpath that Firefox/Firebug/Firepath all report: //html/body/p[4]/table/tbody/tr[3] Here’s the true xpath as parsed by Nokogiri: //html/body/table/tr[3]

Notice that the ‘p[4]’ element disappears along with the ’tbody’ element. After reading on StackOverflow.com about the issue it sounds like Firefox ‘corrects’ improperly formed HTML by adding tags. So thus the disparity between what Firefox sees (the orderly HTML standards) and what Nokogiri sees (the cold hard real-life web).

Now I want a browser plugin that shows me what is truly there so I don’t have to interpret and modify the information that Firefox gives me. Please speak up in the comments if you know a better way to avoid this annoyance.

06 Jul 2012

File under 'Commands I should Know'

#shutdown now -h Pretty important but didn’t know about the ‘-h’ flag until today. When run as root this will shutdown system (Linux specific) now and send it to a init-level 0, ie a full halt. Good to know =D. As pointed out in comments ‘shutdown -p now’ is the BSD equivalent.

05 Jul 2012

Awesome tool for recovering ecryptfs from Ubuntu/Canonical

During the process of recovering an encrypted filesystem from an old server of mine, I found that I couldn’t access any of the data. It appears that I encrypted my home folder on the server using Ubuntu’s default ecryptfs.

I ‘Googled’ my way through locating the encrypted ‘.Private’ folder and tried unsuccessfully to mount the folder. I then used ecryptfs-unwrap-passphrase to unwrap my unique passphrase using the old login password.

Armed with that and a tool from Dustin Kirkland over at Canonical (ecrypt-recover-private) I succeeded in using the following command to recover my data sudo ecryptfs-recover-private .Private(at appropriate location) (Note - replace .Private with whichever location it’s at) Then type in the login password, which yields the passphrase, which leads to mounting the data as a read-only directory in /tmp.

Thanks Canonical!

05 Jul 2012

Linux Command of the Day

No fancy commands today, just good ol’ functional admin action. I was setting up a couple of new Virtual Private Servers (ala EC2 or DamnVPS) and came across the following helpful command ssh-copy-id -i ~/.ssh/id\_rsa.pub username@remote\_server\_ip\_or\_domain What it does is copies (appends) your ssh key to the list of permissible keys on the remote server.

Need to make an SSH key for this purpose? ssh-keygen -t rsa -b 4096

Need to install ssh-copy-id on OSX? brew install ssh-copy-id

04 Jul 2012

Installing JungleDisk on Mint 13 (12.04)

I’m setting up a home server (which will be a blog post for a later date).

But in the process of setting up Amazon S3 backups, I was installing JungleDisk. This allows mounting of a remote S3 bucket as if it were a local drive.

After installing, I received the following error stating that it could not find ’libnotify.so.1’. I used the locate command to find libnotify (currently libnotify.so.4) and used symlinked it into place. sudo ln -s /usr/lib/x86\_64-linux-gnu/libnotify.so.4 /usr/lib/x86\_64-linux-gnu/libnotify.so.1

After the symlink everything started up correctly :).

Btw, server is a quad core AMD (Phenom), 12 GB RAM, plenty of storage, SSD for boot drive, spinner platters for storage… and ZFS =D.

25 Jun 2012

Parallels - Your Ads belong in Trial Versions

Dear Parallels Team,

It’s unacceptable to put ads in a product for which I paid full price (ie Parallels Desktop 7).

Thanks to Google and someone in the Parallels forum for mentioning that the answer is here: NetizenSmith.org.uk. This is a much more palattable place to learn about it compared to needing a Private Message on the Parallels Forum.

Credit to Netizen for finding out that you can type the following in the terminal to remove their ads: defaults write com.parallels.Parallels Desktop ProductPromo.ForcePromoOff -bool YES

29 May 2012

Tmux, a reluctant love story

Tmux…a complicated love story.

In the beginning it was like other romance, flirt a little bit, brief moments of interaction, maybe compile from scratch and smell the roses on distant servers.

But it never really stuck. GNU Screen was quite the workhorse and I only needed it on remote servers because Terminator terminal emulator worked very well with multi-splits and session state maintenance. I had many “Oh that’s what I’ve been doing” type moments. At the time I didn’t see any great benefit to it on a local box.

I gave it another try on my recent Arch laptop because all the cool kids seemed to have a revived interest in tmux. The controls were nice, the look was nice, but it had more funky scrolling and mouse selecting issues than Terminator… so back I went. (Note: I remember having to choose my configuration between either a functional copy and paste w/ mouse or functional copy and paste with keyboard…but this is loosely based on my distant memories).

Fast forward to last night when I received a reminder about tmux. I thought, there’s probably something I’m missing. Lo and behold, the thing I’m missing is the awesome power of Slime.vim, Pry, and Terminal or Gvim (MacVIM).

Slime.vim allows you to program in your editor, then highlight a block of code to execute, mash C-c C-c (very Emacs), and have the code run in another split window (powered by Tmux). For me the code composition happens in VIM and is piped into a Pry session for execution and exploration. I foresee many more Slime/Pry/VIM sessions in my future and am excited about the fluidity of this setup.

The long and short of how to get it working, install tmux, add vimrc command telling it use tmux rather than screen, highlight and C-c x2 for execution.

Seems like a perfect workflow and I’m happy that I explored more about tmux.

Next is xmonad…. here I come (on linux boxes).

28 May 2012

An Exercise: Contact Scraping or How to Get Hired with ZeroMail

I’m trying out ZeroMail and saw their job listing at zeromail.

Though I’m happily and gainfully employed, it looked like an interesting exercise. The challenge is (verbatim from website): (1) Download http://zeromail.com/static/download/emails.txt (2) Write a python program that extracts contact information from signatures in the emails (3) Send your solution to bart@zeromail.com or skype it to bartjellema

After an hour of tinkering with it on my own terms (my Ruby > my Python), I’m realizing it’s more complex than I initially expected.

I see it as a multipart process: (1) Strip junk characters and lines out of the data (2) Determine and locate signature blocks (3) Extract pertinent data from signature blocks (might resort to wordlists of given names)

I tried using easily identifiable information through regexes (ie looking for Australian and US format phone numbers). This yielded decent results on the numbers themselves and I’m considering using phone number’s (or email addresses’) line numbers as ‘hotspots’ that I can extract and further process.

If I spend more time on the process, I’ll post my results here.

27 May 2012

Todo.txt Count in RPrompt ZSH

I’ve been using Todo.txt for at least a year, both from the commandline and the Android app.

It’s great keeping a todo list in text files, synchronized across multiple devices via Dropbox, but there’s one nagging thing in my system.
That nagging issue is that Todo.txt doesn’t nag me enough about the issues in my todo list! So, I’m taking steps to resolve this by adding a function to my zshrc configuration.

I mapped out the basic command in the shell by combining todo’s ls, grep (to keep only lines starting with numbers), and then a wordcount by line.

The following function was taken from Wynnnetherland.com and modified since I installed todo.txt through Homebrew, and since I prefer a different command to count the items.

todo\_count(){ if $(which todo.sh &\> /dev/null) then num=$(echo $(todo.sh ls | grep "^[0-9]" | wc -l)) let todos=num if [$todos != 0] then echo "$todos" else echo "" fi else echo "" fi } function zle-line-init zle-keymap-select { RPS1="${${KEYMAP/vicmd/\>\>\> ○[$(todo\_count)]}/(main|viins)/± \<\<\<[$(todo\_count)]}" RPS2=$RPS1 zle reset-prompt }

The following is a diff of the difference showing the updated function [gist id=2817309]

Now my right hand prompt shows a nice count of incomplete todo items inside square brackets…like so: [gist id=2817339]

Update: My own syntax highlighting is escaping some of the characters, so here’s a gist of it [gist id=2817304]

24 May 2012

VIM Movement command of the Day

The movment letter of today for VIM is G Using ‘G’ moves to the end of the file (or more specifically to the final line of the file).

The complimentary command is ‘gg’, to jump to top of file!

VIM: The more I learn, the better it gets.