Container of the Week – clue/polipo

A HTTP proxy is an essential component if you have a slow Internet link, or are simply doing a lot of builds that require downloading a lot of data. I like the Polipo caching HTTP proxy as it’s simple and single threaded. I was a bit sad to discover that Polipo is no longer maintained, as of August 2016. Hopefully it will remain useful for sometime yet or maybe even be picked up by a new maintainer.

Overview

Polipo is very simple to configure and run, which is one reason why I like it. The following command starts up a HTTP proxy on port 8123:

$ docker run -d -p 8123:8123 clue/polipo \
    proxyAddress=0.0.0.0

Security note: the default behaviour when publishing a network port is to listen on all IP addresses. This means that any host on your network can connect to the proxy. If you don’t wish this to be the case specify a more specific bind address in the port  forward option, for example 127.0.0.1.

$ docker run -d -p 127.0.0.1:8123:8123 clue/polipo \
    proxyAddress=0.0.0.0

One nice feature of Polipo is that it’s configured to auto-expire cached data. By default

I’ve used the polipo container for a long time now and here is my collection of tips and tricks.

Persisting cached data

A caching HTTP proxy is useful, but it’s nice to persist the cached data between Docker restarts, reboots, etc. I connect a host volume to ensure cached data is persisted.

$ docker run -d -p 8123:8123 \
    -v ~/.cache/polipo:/var/cache/polipo \
    clue/polipo proxyAddress=0.0.0.0

Polipo stores cached data by virtual host name so you can look in the ~/.cache/polipo directory and get an idea of what the cache is doing. Unfortunately polipo does not generate any per-request logs so you need to use du(1) and netstat(1) or equivalent tools for troubleshooting.

Another cache-related trick with Polipo is to read the contents of the cache files. Every successfully cached request is stored in a base64 filename which starts with the HTTP headers and is followed by the response data. It’s possible to poke around and search the HTTP headers. The Location and X-Polipo-* headers can be particularly useful.

Connecting to a Parent Proxy

If you are behind a corporate proxy and all port 80 and port 443 connects are rejected, then you will need to configure Polipo to connect to a parent proxy. Pass an additional command line to the docker run command.

$ docker run -d -p 8123:8123 \
    clue/polipo proxyAddress=0.0.0.0 \
    parentProxy=web-proxy.corp.com:8123

Adjusting Restart Policy

Usually a HTTP proxy is something that you want Docker to start and keep running all the time. To achieve this the restart policy of the container needs to be set to something other than the default which is to perform no restarts if the container exits or is stopped by hand.

My favourite restart policy setting is unless-stopped in which Docker restarts the container if it crashes, but also allows you to run docker stop by hand if you wish to stop Polipo for some reason.

$ docker run -d -p 8123:8123 \
    --restart unless-stopped clue/polipo \
    proxyAddress=0.0.0.0

Resources

The Polipo manual is long and very complete and should be the first port of call for questions about Polipo. The Polipo FAQ is also useful. Finally, the Arch Linux Wiki Polipo page has some good information on integrating Polipo with other services.

Leave a comment

Your email address will not be published. Required fields are marked *