negativesign
archive colophon elsewhere
andrew catellier
  • Autostarting Docker Compose as Linux a Service

    November 25 2018

    In my previous post I talked about how to build some Docker containers to run a deep learning-focused1 JupyterHub instance.

    It’s nice for several reasons if the JupyterHub server brings itself up when the system starts. Here’s how to do that—an adaptation of this Stack Overflow post.

    Create a file at /etc/systemd/system/docker-jupyterhub.service and put the following text into the file.

    [Unit]
    Description=JupyterHub container
    Requires=docker.service
    After=docker.service
    
    [Service]
    Restart=always
    ExecStart=/usr/local/bin/docker-compose start -f /path/to/GPU-Jupyterhub/docker-compose.yml --project-directory /path/to/GPU-Jupyterhub/
    ExecStop=/usr/local/bin/docker-compose stop -f /path/to/GPU-Jupyterhub/docker-compose.yml --project-directory /path/to/GPU-Jupyterhub/
    
    [Install]
    WantedBy=default.target
    

    Once that file has been created, issue the following command:

    sudo systemctl enable docker-jupyterhub.service
    

    Now the JupyterHub server should start when the host computer boots and you won’t have to login to start it manually.

    It was a little tricky to find that --project-directory flag, and turns out it’s important to put the start and stop before any other arguments.

    ¯\_(ツ)_/¯


    1. ugh [return]


  • Docker and JupyterHub for Deep Learning

    November 11 2018

    If you’ve gone down the road of building your own machine for “deep learning”,1 you may also have some sort of Jupyter Notebook server running on it. You may also be able to access that server when you’re not on your local network. The Jupyter Foundation have done a good job of documenting how to do this in a fairly secure way. But standalone Jupyter Notebook servers aren’t ideal if you want more than one person to be able to access your machine.

    In my case, a couple users need access to a deep learning server and I like the idea of being able to quickly deploy elsewhere if needed. I don’t really need to go full K8s, so I decided to set up a JupyterHub server using Docker.

    In order to be able to use JupyterHub in a Docker container and access your NVIDIA GPU, there are three high-level steps to complete:

    1. Install NVIDIA display drivers and CUDA onto your system
    2. Install Docker and NVIDIA Docker
    3. Build a Deep Learning Container and Tell JupyterHub to Use It

    I’ll quickly go through the first two before addressing some pain points in the last one.

    Installing Display Drivers and CUDA

    My machine is running Ubuntu 18.04, and unfortunately that means some compatibility issues with CUDA 9.2.2 CUDA 10.0 installs just fine and it requires NVIDIA driver release 410. To install the NVIDIA display drivers:

    sudo add-apt-repository ppa:graphics-drivers/ppa
    sudo apt update
    sudo apt install nvidia-396
    

    Wait a second, why didn’t we install 410 just then? Well, at time of writing, 410 isn’t…accessible via the command line (?!) due to some logical reason on NVIDIA’s part or my ignorance of the apt package manager. However, you can install 410 by loading the Software & Updates program, clicking on Additional Drivers and then selecting the 410 metapackage driver.

    After that, it’s easy to follow the CUDA Installation Guide. Might as well install CuDNN while you’re at it.

    Install Docker and NVIDIA Docker

    This part is straightforward. Follow the official Docker install directions and then follow the directions to install NVIDIA Docker. You might consider adding yourself to the docker group as well.

    Build a Deep Learning Container and Tell JupyterHub to Use It

    Here’s where it gets complicated. At this point, we are able to access our GPU from within a Docker container (cool!). Using JupyterHub, it’s easy to specify which container you’d like it to launch when a user logs in, but that container has a few competing requirements:

    • It must have Jupyter/JupyterHub installed, and must have a suitable entrypoint in place
    • The container itself must have the same version of CUDA installed as what’s on the host
    • The container must also have any libraries/software you require

    That doesn’t sound that difficult, but it rules out cool projects like Deepo and Darknet and complicates the usage of the NVIDIA Cloud containers.3

    In the middle of trying to solve this problem, I stumbled on a project called GPU-Jupyterhub. It appears to be a fork of jupyterhub-deploy-docker with a user container that’s been modified to inherit from the NVIDIA CUDA container images on Docker Hub. Basically the author of GPU-Jupyterhub4 did the work of building a container with Jupyter/JupyterHub and lots of other goodies (Tensorflow, Tensor Board, Keras, PyTorch, others) installed.

    Great! I forked that repository, removed some medical imaging libraries which I won’t use, tried to clean up dependencies, made sure user containers were spawned in the NVIDIA Docker runtime, added a place for users to access common data, and tried to make the install process a little easier. Also, due to my specific configuration, using GitHub as an authenticator is not an option, so I’m using the PAM authenticator. Here is how to set up JupyterHub:

    1. Clone my GPU JupyterHub repo
    2. Make a userlist file
    3. Update c.Authenticator.whitelist and c.Authenticator.admin_users in jupyterhub_config.py as needed
    4. Add desired users and passwords to Dockerfile.jupyterhub, example here—this is necessary to use PAM user/pass authentication
    5. Generate a value for CONFIGPROXY_AUTH_TOKEN and POSTGRES_PASSWORD in the file .env
    6. Generate an SSL certificate (through Let’s Encrypt if you’ve got some DNS A records pointing to your deep learning machine or by signing your own) and place it in a directory called .ssl at the root level of the repository
    7. Create Docker networks and volumes for JupyterHub—examples in docker-compose.yml
    8. Edit the last volume in the hub service in docker-compose.yml file to point to a directory on the host with data you’d like users to share
    9. Build the base notebook while in the base-notebook directory: docker build -t base-notebook-gpu .
    10. Build the Tensorflow notebook while in the tensorflow-notebook-gpu directory: docker build -t hub-deep-learning-notebook-gpu .
    11. Build the JupyterHub system from the root of the repo: docker-compose up --build

    If everything went well (hahaha) you should be able to access your JupyterHub server at https://localhost.

    If there’s an install step I forgot (very likely), please let me know. If you’d like to take on cleaning up that main container, you’re my hero.

    There are definitely lots of improvements yet to be made and I know there are some weird things happening in the Dockerfiles, in the conda environment, and basic configuration. That said, this will get you up and running.


    1. That term still sounds goofy to me, but I suppose I’m just showing my age. [return]
    2. I ended up solving those issues. And then I had issues uninstalling CUDA 9.2 but there’s no need to reopen those wounds. [return]
    3. The NVIDIA Cloud PyTorch container image runs CUDA 10 and has a preview of features from PyTorch 1.0. But it’s based on Ubuntu 16.04. 🤷‍♂️ [return]
    4. Samir Jabari from Friedrich-Alexander Universitat—unfortunately the blog posts associated with this repo are now returning 404. [return]


  • It's Been Real, Octopress

    July 09 2018

    Just finished moving ye old blog from Octopress to Hugo under the premise that I’d write and publish a little more if I could use a tool that is is a little more modern.1 You know, more modern than a vintage 2012 Ruby.

    I took a look at other options before I took the leap—Nikola and a couple other Python-based systems looked promising. I started hacking around with Nikola before deciding it didn’t feel quite right.

    Hugo has been a joy to use so far. I don’t know if it’s just that I’ve got an extra six years of programming under my belt, but the system seems to be more intuitive than I remember Octopress/Jekyll being. It’s got great documentation and forums to boot. I was able to convert my entire blog over, including manually editing some blog posts and hacking away at a new design/layout over the course of a weekend. I even managed to make it segfault!

    I’m pretty jazzed that Hugo can publish Jupyter Notebooks. It also seems like a good way forward for hosting a microblog and moving away from Twitter a little.

    Thanks to Lingyi Hu for developing the Er theme that I hacked at to make this update. There are still a few things to clean up, but overall I’m happy with the fresh look.


    1. At time of writing, the first item on the Octopress website is an article titled Octopress 3.0 Is Coming that’s dated in January of 2015. [return]


  • Building AVG DAY Again

    January 03 2018

    A few weeks back I went to a meetup where people, uh, meet, to talk about artistic projects they’re working on and critique each other’s work. I took some prints of my AVG DAY project from last year and got some really good feedback.

    Here is the question that stood out most to me: what do multiple exposures look like as images are added to the exposure? I heard that same question in some form throughout the year as I showed people the project and I this as well: how are the puzzling elements of these images formed?

    One easy way to answer this question would be to build a multiple exposure for each day that had multiple photos associated with it.

    Script Mechanics

    The code I described last year served as a foundation for this new goal. I just needed to do a little metadata reading and book keeping to organize the photos from an entire month into groups of photos for each day. Once that was done it was trivial to call the averaging function on each group of images individually. Cool.

    Another bit of feedback I received concerned whether the color of the multiple exposure images changed as the year went on. The way I processed the images last year made it impossible to answer that question. By performing averaging/histogram equalization on each color channel individually I completely destroyed the color balance present in each individual image. Rookie mistake!

    I tried several different ways of doing the histogram equalization (including converting to a different colorspace and only performing equalization on the luminance or value channel, using a contrast-limited adaptive histogram equalization function) but in general those methods created images with too much contrast and displeasing color.

    I settled on a much simpler technique: what scikit-image calls “contrast stretching.” Essentially, calculate where the vast majority of the dynamic range in the multiple exposure lies and then force it to span the dynamic range of its storage data type. I know I’ll anger some pedants by making the following analogy, but it’s kind of like applying a compressor to a vocal track.

    This seemed to make the multiple exposures with fewer than eight component images look like I expected, some with stunning results. Using this technique to generate a multiple exposure for an entire month resulted in browns and other muted tones that might be expected if you mix a bunch of colors together. While this result is arguably “more correct”, I am undecided if it is an improvement over the method employed last year.

    One thing I didn’t do (but really should) is convert from sRGB or AdobeRGB to linear RGB before performing any image altering calculations and then converting it back.

    Finishing Touches

    After I settled on the image processing approach I took a beat and thought about what I should do with all the images that would be generated. The natural place for them is my image portfolio and the way to get them there is to upload them to flickr with carefully crafted metadata.

    I have a few cusswords about dealing with image metadata in Python. In order to group the images by day I needed to get at the machine tags I added to the source images, and those tags are stored in IPTC metadata. It seems there are no Python3-ready libraries that will compile on a Mac and reliably read IPTC data. The approach I ended up taking was to install exiv2 (using Homebrew), make calls to exiv2 from the shell and parse standard out. To write metadata I generated a text file that contained exiv2 commands and then called exiv2 from the shell. Painful, but reliable for my use case.

    Putting this project together was fun and the results are gratifying. A survey of the output goes a long way towards answering the general question posed above. It’s also instructive about what photographic elements combine to make compelling multiple exposures. This has already influenced the daily photos I’ve taken since.



  • 2017 in Review

    December 31 2017

    Nothing like a bulleted list to make you feel good about your year. But seriously—I wanted to get one more item on the blog before the calendar turns and reviewing my year seems like a good way to close out 2017.

    In the past I was required to write out a list of accomplishments twice a year for performance evaluations. Every time I did that I ended up writing thousands of words and I was surprised by the number of large accomplishments that I somehow completely forgot about.

    This was a big year; lots of important events I don’t want to forget. I accomplished a ton, I had a huge life change in the form of a new job, and my best friends got married. Some specifics:

    • Took a class on algorithms and data structures
    • Took a class on statistical learning techniques
    • Took a class on modern machine learning tools
    • Open-sourced some image processing code
    • Celebrated my dad’s 60th birthday in Snowbird
    • Served as best man for Steven’s and Dustin’s wedding
    • Met Christina’s friends and family in Ohio
    • Watched my good friend Bri get married
    • Bought a new car
    • Sold the car I’ve had for 15 years
    • Before I moved on from my job at NTIA, I:
      • Led a project to develop a signal processing engine
      • Facilitated a massive crowdsourcing test, possibly the largest MRT to date (150,000 trials!)
      • Planned a long-term research project on IoT radio-frequency interference issues
      • Contributed to a five year research plan in the areas of audio and video quality
      • Sysadmin on like, so many machines
      • Served as the main point of contact for a Colorado School of Mines capstone project
      • Contributed to a survey document on the state of encryption technologies (!)
      • Contributed to two NTIA technical documents
    • Attended delicious popup dinners at MCA Denver
    • Saw the moon darken the sun
    • Celebrated my 34th birthday a couple times
    • Settled into a new job with the title “Computer Vision and Machine Learning Scientist”
      • Wrote code to analyze fields of crops and extract useful information
      • Designed and built a GIS and image processing pipeline
      • Sysadmin on like, one machine
      • Other stuff too
    • Saw some epic concerts
    • Hiked a fair amount (and apparently grew my hair really long)
    • Posted more than 320 dailies (a new record!)

    I know there are things that I left out—and maybe I’ll update this later—but it was nice to spend some time reflecting on the year and reaping some returns on the investment I’ve made in recording what happens most days for the eighth year running.



  • When Neither of Your Hasselblads are Just Right

    September 10 2017

    Ming Thein, prolific writer and photographer (with cameras that are like, way too nice, I guess):

    …there are increasingly times when I do things and go places that are not 100% dedicated to the creation of images; at these times there may be some opportunities for photography and you’d like to not compromise too much…

    …so where are the proper enthusiast compacts? In other words: either the non-photographer’s “serious camera”, or the photographer’s “un-camera”.

    He ends up picking the Panasonic GX85 (after nearly 2,000 words), but I find the Fuji x100T perfect for this use case. Beginners can make really great photos when running in a fully-automatic mode and those with more experience have all the control they need.



  • You Must Have a Good Camera

    August 19 2017

    Photorealism in video games has improved by leaps and bounds in the last several years. Alan Butler has taken advantage of this by creating beautiful compositions in the virtual world that exists in Grand Theft Auto V.

    Down and Out in Los Santos is series of photographs that are created by exploiting a smartphone camera feature within the video game Grand Theft Auto V. Players of GTAV can put away the guns and knives, and instead take photos within the game environment. This operates in basically the same way as ‘real’ cameras do. I walk around a three-dimensional space, find a subject, point the camera, compose the shot, focus, and click the shutter. I have taken a photograph.

    A year ago I would have said a real smartphone wouldn’t be able to make photos with the shallow depth of field shown in Butler’s images. I would have been proven wrong a month later.

    The concept of creating street photographs without leaving your house is fascinating. The amount of detail and thought invested into the art assets in this game is astounding. Fascinating how many video games can be whatever you make of them.

    Cool stuff, man.



  • Precious Goods

    August 15 2017

    I don’t always agree with ol’ Brooks Jensen but I do enjoy listening to his podcast. This specific…release? episode?…struck a chord with me though. He reads from the introduction of James T. Farrell’s Studs Lonigan:

    There is the self-imposed loneliness. There is the endless struggle to perceive freshly and clearly—to realize and recreate a sense of life on paper. […] the writer feels frequently that he is competing with time and life itself. His hopes will sometimes ride high; his ambitions will soar until they have become so grandiose that they cannot be realized within the space of a single lifetime.

    The world opens up before the young writer as a grand and glorious adventure in feeling and in understanding. Nothing human is unimportant to him. Everything that he sees is germane to his purpose. Every word that he hears uttered is of potential use to him. Every mood, every passing fancy, every trivial thought can have its meaning and its place in the store of experience which he accumulates. The opportunities for assimilation are enormous—endless—and there is only one single short life of struggle in which to assimilate.

    A melancholy sense of time becomes a torment. One’s whole spirit rebels against a truism which all men must realize because it applies to all men. One seethes in rebellion against the realization that the human being must accept limitations—that he can develop in one line of effort only at the cost of making many sacrifices in other lines. Time becomes for the writer the most precious good in all the world and how often will he not feel that he is squandering this precious good.

    I suppose there are many versions of this sentiment written by a diverse group of people—there’s nothing new under the sun. It was interesting, however, to hear it spoken out loud while riding my bike to work through headphones that are connected by magic to a phone that the author could have used to his great advantage by storing notes, recordings, and photographs.



  • NASA's Voyager Mission

    August 05 2017

    You’ve probably seen many other people pointing to this New York Times article this week, but I couldn’t help myself. Though as a whole the article was bittersweet I love to learn about cool efficiency tweaks like this one:

    Turning the heaters off for a while is the safest way to get enough power to run the instruments, but the lower the overall wattage drops, the faster parts will freeze. One of the team’s most valuable insights so far: Spinning the wheels of an eight-track tape recorder — the spacecrafts’ only data-storage option — generates a bit of additional heat.

    The flight crew’s sense of duty is inspiring.

    Over decades, the crew members who have remained have forgone promotions, the lure of nearby Silicon Valley and, more recently, retirement, to stay with the spacecraft.

    It’s sad to me that engineers working on such a long term mission weren’t recognized the same way people who moved to other missions apparently were. I strongly believe in the value of long-term research efforts and this is yet another sign that our society and government don’t share this sentiment. Difficulty in attracting new talent to the team is hard to come to terms with as well.

    NASA funding, which peaked during the Apollo program in the 1960s, has dwindled, making it next to impossible to recruit young computer-science majors away from the likes of Google and Facebook.

    There are more factors to this than the relative level of NASA’s funding. For example, pure computer-science majors maybe wouldn’t be the best fit for a team like this. I wonder, though, for how many young people working in the tech industry is a higher priority than working for NASA. It’s hard to imagine another government agency with such broad appeal—so what hope is there for other agencies that so badly need an influx of fresh, diverse and dedicated workers? I have a feeling that the efforts of the current administration to reduce the size of the federal workforce aren’t helping.



  • Building AVG DAY

    February 13 2017

    In order to make simulated multiple exposures I wrote some Python code to average several images. Here’s a little more about how I made the images.

    Concept

    I thought about how to build the program so that it most accurately simulated a multiple exposure created in a film camera. When making a double exposure, a photographer adjusts all available camera settings so that two exposures will result in an image that properly exposes the film. Say that for a given camera and scene the proper exposure time would be on fiftieth of a second. One way to make a double exposure would be to take one image with an exposure time of one twenty-fifth of a second, rewind the film one frame, and take another image with the same exposure time.

    That’s one way to do it. But a photographer needn’t split the exposure time among images equally. Besides that, part of the fun of making multiple exposures is that one doesn’t necessarily know how it’s going to turn out. It could even be that the photographer has so much experience making multiple exposures that they know exactly how to adjust the camera to create a specific image.

    With all that in mind I decided that any weighting applied to individual images is purely an artistic choice—this gave me some freedom about how to implement multiple exposures in code.

    Script Mechanics

    In order to specify which images should be made into a multiple exposure, I put the desired group of images in a folder on my computer’s filesystem. Then I had the script get a list of all the images in the folder. The images aren’t guaranteed to have the same dimensions and I didn’t want to hard-code output image dimensions, so the script loops through all the files in the folder and gets each file’s dimensions in pixels and exposure time. I made a function that, based on an input argument, finds either the smallest or largest dimension in the group of photos. Before processing, each photo is either cropped to a square with the smallest dimension or padded to a square with the largest dimension. This ensures there are no matrix dimension errors when it comes time to add everything together.

    I used Python Imaging Library (PIL) to crop, pad, and do math on the images. When PIL loads an image, each pixel has a red, green and blue component that are stored as integers. I learned that the ImageMath object only operates on single channel data, so the image’s split() method is needed to store each color channel separately. It’s also necessary to convert each channel’s pixel data to floats before multiplying by a fraction.

    In a loop, the script loads each image one-at-a-time, splits the current image into its three color channels, multiplies each channel by some fraction and then adds the result to three variables containing the red, green and blue channel for the new composite image. Finally, all three color channels are converted back to the integer domain, the channels are merged into one RGB image, and the image is written to disk.

    Finishing Touches

    I tried a couple different ways to combine the images including scaling each image by its actual exposure time and doing a naive average. I also thought about compensating for ISO but like, nah. I settled for the naive average this time around.

    When processed as described above the resulting image is often very dark. The result is stark and stunning, but there are lots of interesting details muddied by the lack of dynamic range. I tried a few different things to bring out details and balance the image. When I generated the images for my avg day gallery I applied a simple histogram equalization just before the three color channels were combined into one image. This means that the equalization was performed in the integer domain and this seemed to really exaggerate the nuance in the composite images. Additionally, by definition, the histogram equalization algorithm spans the available dynamic range. The effect is interesting, but certainly not subtle.

    Having finished the image processing code I needed to get all the photos published to my 2016 dailies Flickr set sorted into twelve folders. I wrote another script to query the Flickr API, download all the folders in the set and sort them by month they were taken. Then the downloading script called the image combination script one time for each of the twelve folders it had just created.

    I had a ton of fun working on this project. If you’d like to take a look at the code it’s posted on GitHub. I’m looking forward to working on different ways to bring out all the interesting details—and replace that simple histogram equalization—a little further down the road.



back to top

p. 2 / 6
← older newer →

JSON | RSS
copyright © 2011-2020 Andrew Catellier thisisreal.net