Data in Motion, Data at Rest

Maintaining Your Day-to-Day Data Flows While Traveling

What’s your data pattern? Do you download the same things every week? Do you visit the same sites every day?

Traveling interrupts how you consume data for various reasons. Maybe your home Internet connection is over 300 Mbps, but you’re stuck using a high latency airplane satellite link for 12 hours at a time.

Script for large web downloads over an unstable connection

Just call this script with an argument of the URL you want to download. It’ll strip any query parameters then download the URL until it completes. If the download is interrupted, the script will attempt to resume where the download left off.

#!/usr/bin/env perl

use File::Basename;

my $url = $ARGV[0];

# Transform $url into a local filename without any query parameters.
my $filename = $url;
$filename =~ s/\?.*//;

# Cheap and dirty way of getting only a filename out of the URL
$filename = basename $filename;

print "Saving $url as $filename\n";

# Download URL $url to local name $filename
do {
    # Continue attempting download if it fails
    # or is interrupted/cutoff (-C -).
    # Download will resume from its last saved
    # position each time (if server supports it).
    `curl -L -o $filename -C - $url`;
} while (($? >> 8) != 0);

Script to download TV episodes automatically

Thanks to some public APIs, we can even download episodes of shows soon after they air. See my same-day auto-downloader at https://github.com/mattsta/getTV

You can run getTV locally or on a remote server.

If you’re traveling with unpredictable hours and connectivity, you’ll want to run getTV on a remote server with transmission-daemon. Once you’ve settled during your travels, you can run getTV locally again.

You can make transmission-daemon listen on localhost then use ssh port forwarding to securely access the transmission-daemon web interface since the Transmission interface doesn’t support encryption.

Example of remote management

By default transmission-daemon’s RPC interface will listen on port 9091, so you can forward remote port 9091 to local port 8888 then browse it locally.

autossh -M0 -L 8888:localhost:9091 some.remote-ssh-server-name.com

Usage: Open your browser to http://localhost:8888/, enter your transmission-daemon username/password, and manage previously added torrents over a secure connection (which transmission-daemon doesn’t support natively).

Since you’re on a remote (assumed to be flaky) connection, use autossh to re-start the tunnel if ssh terminates due to any error.

With your episodes now on a remote server, how do you watch them? You can rsync them back to your local machine or you can set up a web server to expose your episode download directory over http (ideally with user/pass auth or IP restrictions) then use the http download script above for resumable downloading. As a bonus, since you’re downloading an already completed file, you can start watching the episode as it downloads (as long as remaining download time is less than the remaining length to watch).

SOCKS5 Proxy

If you have a remote server, you can run a free proxy service for your travel web browsing. Just launch ssh in -D mode to create an automatic SOCKS5 tunnel.

autossh -M0 -D 3128 some.remote-ssh-server-name.com

After the tunnel is established, modify your system preferences to use localhost:3128 as your SOCKS5 server. Verify the proxy is working by using a “What’s my IP” lookup service and verifying it returns your server IP, not your local wifi NAT IP.

Again, we use autossh to help us not have to re-start sessions during failures.

Script for Instagram picture/video downloads

Maybe you want a higher quality verison of Instagram contents than their crappy web interface provides. Maybe your connection isn’t stable enough to buffer even small Instagram content. Just give the Instagram URL as a parameter to this python script and it’ll capture either the image or video for the page.

#!/usr/bin/env python

import urllib
import urllib2
import BeautifulSoup
import sys
import re
import os

page = urllib2.urlopen(sys.argv[1]).read()
soup = BeautifulSoup.BeautifulSoup(page)

for prop in ["og:image", "og:video"]:
    actualImage = soup.find("meta", {"property": prop})
    try:
        imageURL = actualImage["content"]
    except:
        continue

    removedQueryParams = os.path.basename(re.sub(r"\?.*$", "", imageURL))

    print "Fetching", imageURL, "as", removedQueryParams

    urllib.URLopener().retrieve(imageURL, removedQueryParams)

Metric	Min	Max	Mean	Median	Total
Humor	1	2	1.33	1	4
Helpfulness	7	9	8.00	8	24
Aggression	0	0	0.00	0	0
Spiciness	0	1	0.33	0	1