Containers Save the Day!

Containers Save the Day!

In my previous post, I talked about a problem I was having with SSL/TLS inside of OpenSSL. This entry will describe the solution that I came up with to solve this problem. I covered an early version of the solution in the previous post, but in this post will talk about the end result.

The Proxy

The proxy code appeared in the previous post, but it has changed a little. The solution has three files:

  • anchore_proxy.py - a python script that implements the proxy
  • Dockerfile - a Docker file that specifies how the image is built
  • requirements.txt - a file that specifies the libraries used

The Dockerfile

The Dockerfile is fairly straightforward:


FROM centos

EXPOSE 5000

RUN yum -y update && yum -y install epel-release
# fix minor https issue with epel
RUN sed -i "s/mirrorlist=https/mirrorlist=http/" /etc/yum.repos.d/epel.repo
RUN yum clean all -y
RUN yum -y install python34-devel python34-pip openssl-devel && yum clean all -y
RUN pip3 install --upgrade pip && pip3 install --upgrade setuptools

COPY . /app
WORKDIR /app
RUN pip3 install --upgrade -r /app/requirements.txt

CMD python3 /app/anchore-proxy.py

If you aren't familiar with Dockerfiles, here is a brief explanation. You are basically describing the image that will be created through a series of instructions. The first instruction is always a FROM instruction that describes the base upon which the image will be built. It can be just a virtual operating system like it is here (centos) or it can be a more sophistocated image that already has some other stuff in it. Later on, I'll be extending the base anchore/anchore-engine image in order to install my client to talk to the proxy. The EXPOSE instruction indicates that this container listens on port 5000 - note that this only exposes the port to other containers and that if you want to reach this port from the outside world, you need to map the container port to an actual port on your machine). The RUN instructions run various commands (in this case I'm installing some packages but I also have to do a minor tweak to the epel repo because https seems to break). The COPY command copies files from the specified localhost directory to the specified image directory (in this case copying the script, Dockerfile and requirements.txt file to /app in the image). WORKDIR sets the working directory, just like doing a cd command. Finally, the CMD instruction indicates what process should be started when the container starts (in this case my proxy script).

The requirements.txt file

This file is really minimal:


requests
urllib3
Flask

The requirements are pretty simple - just the requests library for making HTTP/HTTPS requests, the urllib3 utility library, and the Flask library that powers the web server. You can install the requirements just by running this command (as we did in the Dockerfile):


pip install --upgrade -r requirements.txt

The anchore-proxy.py script

I will skip the imports and the init_logger function which really just initializes the logger to write to stdout and sets the logging format. The first item of interest in the script is the ProxyRequest class. This cass gathers the required information from the Flask request object in the constructor:


class ProxyRequest:
    def __init__(self, is_post=False):
        self.is_post = is_post
        self.target_url = unquote(request.args.get('target'))
        try:
            if request.args.get('headers'):
                self.headers = json.loads(unquote(request.args.get('headers')))
            else:
                self.headers = {}
        except Exception:
            raise Exception("Invalid headers param")
        if is_post:
            self.post_data = self.get_post_data()

This class handles both GET and POST requests. It then gets the target_url which is the URL that we are proxying. It then gets any headers that have been passed along as well. I tried actually sending all the headers that came as part of the request, but it appears that confuses the site that you are talking to and I didn't take the time to figure out which ones to filter. Instead, I have the client explicitly pass the headers encoded in JSON (and URL-encoded). If it is a POST, I also get the post data.

Another method of the class makes the actual request:


def make_request(self):
    if self.is_post:
        response = requests.post(url=self.target_url, headers=self.headers, data=self.post_data)
    else:
        response = requests.get(url=self.target_url, headers=self.headers)

    if response.status_code == 200:
        content_type = response.headers.get('Content-Type')
        return Response(response.content, content_type=content_type, mimetype=content_type, status=200)
    else:
        return Response("ERROR: request returned status of {}".format(response.status_code), status=400)

The code here is pretty straightforward as it channels the gathered data into the requests library to make the actual request. We just check the response code status and otherwise return an error response (technically I could just return the actual response no matter what, and may change it to that in the future). The get_post_data method is fairly straightforward, just getting the post data from the Flask request object:


@staticmethod
def get_post_data():
    if request.form:
        return request.form
    elif request.data:
        return request.data
    else:
        return ""

The last part of the script are the handlers for the GET and POST services and the code that starts Flask running. The service endpoints basically create a ProxyRequest object and then call the make_request_method to return the result:


@app.route('/get')
def proxy_get():
    try:
        req = ProxyRequest()
        return req.make_request()
    except Exception:
        return Response("ERROR: badly formatted request", status=400)

@app.route('/post', methods=['POST'])
def proxy_post():
    try:
        req = ProxyRequest(is_post=True)
        return req.make_request()
    except Exception:
        return Response("ERROR: badly formatted request", status=400)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

The Proxy Client

The proxy client is code that needs to be added to the anchore/anchore-engine image so that we can proxy certain requests. The solution in this case has a number of files:

  • requests_proxy.py - the script that implements part of the requests library interface to proxy the requests
  • feeds.py.patch - a patch file that updates the feeds.py script for calls to requests and replaces them with calls to requests_proxy
  • Dockerfile - a Docker file that specifies how the image will be built
  • docker-compose.yaml - a Docker compose file that will start all the reqired images to run the anchore engine
  • config - a directory containing a sample anchore configuration and other related files

The Dockerfile

The Dockerfile is as follows:


FROM anchore/anchore-engine

# i seem to have an issue with the epel repo, 
# and I don't really need it so disable it for this install
RUN yum-config-manager --disable epel
RUN yum -y update && yum -y install patch && yum -y install libffi-devel

COPY . /root/anchore-engine-with-proxy
# apply simple patch to proxy the requests to https://ancho.re
RUN cd /root/anchore-engine-with-proxy && patch -d/ -p0 < feeds.py.patch
# add the proxy script into the feed service, where it is actually used
COPY ./requests_proxy.py /root/anchore-engine/anchore_engine/clients/feeds/feed_service/
# do the normal install which will update the feeds.py
RUN cd /root/anchore-engine/ && pip install --upgrade .

# normal anchore-engine entry point
CMD /usr/bin/anchore-engine

As you can see, this image extends the base **anchore/anchore-engine **image. This means that it will essentially act as if all the instructions in the Dockerfile for that image were in this Dockerfile. Building the image, however, won't run all those commands because the image for anchore/anchore-engine already exists. We are just building upon it, so the build spins up a container with the anchore/anchore-engine image in it (but doesn't run the anchore-engine command) and begins to run the instructions in this Dockerfile.

I start by installing patch and libffi-devel (this was needed because a later step would fail unless that library was installed). I then copy the files in the project into the image and then run patch against the feeds.py.patch file to update the existing feeds.py script (more on that later). I then copy the **requests_proxy.py **script to where the anchore-engine scripts live. I install everything using a pip install. Then I just kick off the normal anchore-engine script.

The feeds.py.patch file

I'm not going to go into the specifics of how patch works, but basically the diff command compares two files and produces output that patch can use to apply changes to a file so that it matches the target file. My patch file adds an import statement to import requests_proxy and then replaces certain calls to the requests library with the requests_proxy library. The file essentially looks like this:

        
--- /root/anchore-engine/anchore_engine/clients/feeds/feed_service/feeds.py    (revision )
+++ /root/anchore-engine/anchore_engine/clients/feeds/feed_service/feeds.py    (revision )
@@ -1,4 +1,5 @@
 import requests
+import requests_proxy
 import requests.exceptions
 import base64
 import json
@@ -84,7 +85,7 @@
                 else:
                     auth = (self.user, self.password)
                     logger.debug("making authenticated request (user="+str(self.user)+") to url: " + str(url))
-                    r = requests.get(url, auth=auth, timeout=(conn_timeout, read_timeout))
+                    r = requests_proxy.get(url, auth=auth, timeout=(conn_timeout, read_timeout))
                     logger.debug("\tresponse status_code: " + str(r.status_code))
                     if r.status_code == 401:
                         logger.debug("Got HTTP 401 on authenticated GET, response body: " + str(r.text))
@@ -161,7 +162,7 @@
         user_url = self.anchore_auth['client_info_url'] + '/' + self.anchore_auth['username']
         user_timeout = 60
         retries = 3
-        result = requests.get(user_url, headers={'x-anchore-password': self.anchore_auth['password']})
+        result = requests_proxy.get(user_url, headers={'x-anchore-password': self.anchore_auth['password']})
         if result.status_code == 200:
             user_data = json.loads(result.content)
         else:
@@ -206,7 +207,7 @@
...

You can see that the changes add lines that begin with a + and remove lines that begin with a -. In some cases, we are replacing lines, so we remove the old line and then add the new.

The requests_proxy.py file

The actual proxy client is pretty simple:


proxy_root_url = "http://anchore-proxy:5000"
proxy_get_url = proxy_root_url + "/get?target={}&headers={}"
proxy_post_url = proxy_root_url + "/post?target={}&headers={}"
logger = logging.getLogger(__name__)


def get(url, **kwargs):
    if str(url).startswith("https://ancho.re"):
        headers = kwargs['headers'] or {}
        headers_str = quote(json.dumps(headers))
        return requests.get(proxy_get_url.format(quote(url), headers_str))
    else:
        return requests.get(url, kwargs)


def post(url, **kwargs):
    if str(url).startswith("https://ancho.re"):
        headers = kwargs['headers'] or {}
        post_data = kwargs['data'] or {}
        headers_str = quote(json.dumps(headers))
        return requests.post(proxy_post_url.format(quote(url), headers_str), data=post_data)
    else:
            return requests.post(url, kwargs)

The get and post functions check whether the url starts with https://ancho.re, and only requests that match are proxied. Then I gather the headers which are turned into JSON and URL-encoded. If it is a post I also gather the post data. I then do a get or post to the proxy and set the target (to the URL-encoded target URL) and headers (the URL-encoded header JSON) query params accordingly and return the result.

The docker-compose.yaml file

If you aren't familiar with Docker compose files, don't stress too much. They essentially encode running Docker commands to build up a network of containers. My compose file has three containers that it runs:

  • anchore-engine: the anchore engine with my proxy fix applied
  • anchore-db: the postgresql db
  • anchore-proxy: the proxy server

The anchore-engine service is defined like this:


version: '2'
services:
  anchore-engine:
    # this image extends the standard anchore-engine image
    image: anchore-engine-with-proxy
    depends_on:
     - anchore-db
     - anchore-proxy
    ports:
     - "8228:8228"
     - "8338:8338"
    volumes:
     - ./config/:/config/:Z
    logging:
     driver: "json-file"
     options:
      max-size: 100m

The service uses my image, which I have tagged as anchore-engine-with-proxy. It depends on the anchore-db and anchore-proxy containers running, so they should be started first. The container exposes two ports, 8228 and 8338 (the image itself has other ports that it opens internally, but they aren't exposed outside the container). The config directory that sits alongside the docker-compose.yaml file is mounted into the container at /config. The anchore engine reads the config when it starts up. The container also does some logging as JSON and it is limited to 100mb.

The anchore-db service is similar:


anchore-db:
  image: "postgres:9"
  volumes:
    - /tmp/db/:/var/lib/postgresql/data/pgdata/:Z
  environment:
    - POSTGRES_PASSWORD=mysecretpassword
    - PGDATA=/var/lib/postgresql/data/pgdata/
  logging:
    driver: "json-file"
    options:
      max-size: 100m
#uncomment to expose a port to allow direct/external access to the DB, for debugging
#  ports:
#    - "2345:5432"

This container uses a regular postgres image and mounts a /tmp/db folder to /var/lib/postgresql/data/pgdata inside the container so that data is persisted between container instances if I restart the container. It sets some environmental variables that tell postgres what the root password for the database should be and where the data is stored. It also does some logging. The database port is not exposed externally (the anchore-engine container can access the port because Docker knows that it depends on the database container) but you can uncomment the ports entry to make it exposed.

Finally, the anchore-proxy service is only a few lines:


anchore-proxy:
  container_name: anchore-proxy
  image: anchore-proxy:latest
#uncomment to expose the proxy to the world
#  ports:
#    - "5000:5000"

The container listens on port 5000 internally, but you can expose it if you would like (although that is less secure).

Conclusion

I hope that you have found this post helpful to demonstrate how you can easily solve a minor issue by simply using containers. I don't actually have to touch the image for anchore-engine, I simply extend it and make the minor changes I need. With help from docker-compose, I easily integrate my proxy into the project and make it easy to start up.

Source

The source for this project is available on github at https://github.com/openshiftninja/anchore-engine-with-proxy, and there are instructions on building and using the images there.

Related Article