18 Jul 2016

How to Verify Links After Blog Upgrade

Scrape full link set from site:

wget -r -l4 –spider -D blog.xargs.io http://blog.xargs.io

Analyze link set from site

tree -J -f blog.xargs.io | grep file | grep -o 'name.*' | \
  awk -F":" '{print $2}' | tr -d '",}' | sort -u

Curl to see which currently work

# Deal with multiple saved copies of same entry from wget
cat current_links.log | grep -v "\.[[:digit:]]*$" | \
sed 's/blog.xargs.io/http:\/\/blog.xargs.io/g' | \
  parallel -- \
    "curl -o /dev/null --silent --head --write-out '%{http_code} %{url_effective}\n' {}" | \
  sort -u | tail -r > current_links_master.log

Then run the results.csv through a processor to compare your staging site to your production site. Watch for those 404s and make sure your 302s look good.

cat current_links_master.log | sed 's/blog.xargs.io/localhost:5000/g' | \
  parallel -- \
    "curl -o /dev/null --silent --head --write-out '%{http_code} %{url_effective}\n' {}" | \
  sort -u | tail -r | grep -v "(200|302)"

Credit for these scripts:

xargs.io

All the IO and Multiplexing

18 Jul 2016