Writings — It's always more complicated

May 3, 2026 · Technology

Project Chapel

PBS just put up a new documentary on installing the IBM Quantum Computer at RPI. I was very involved with this effort, and helped make it a success. Lots of the folks that are in this documentary are folks I meet with regularly, including much of the RPI team. So this documentary is extremely meaningful to me.

I also think it provides a pretty reasonable overview on Quantum Computers to the non expert. So give it a spin at your local PBS station, on their app, or live on PBS.org.

January 19, 2018 · Technology

Python functions on OpenWhisk

Part of the wonderful time I had at North Bay Python was also getting to represent IBM on stage for a few minutes as part of our sponsorship of the conference. The thing I showed during those few minutes was writing some Python functions running in OpenWhisk on IBM's Cloud Functions service.

A little bit about OpenWhisk

OpenWhisk is an Apache Foundation open source project to build a serverless / function as a service environment. It uses Docker containers as the foundation, spinning up either predefined or custom named containers, running to completion, then exiting. It was started before Kubernetes, so has it's own Docker orchestration built in. In addition to just the run time, it also has pretty solid logging and interactive editing through the webui. This becomes critical when you do anything that's more than trivial with cloud functions, because the execution environment looks very different than just your laptop.

What are Cloud Functions good for?

Cloud Functions are really good when you have code that you want to run after some event has occurred, and you don't want to maintain a daemon sitting around polling or waiting for that event. A good concrete instance of this is Github Webhooks. If you have a repository that you'd like to do some things automatically on a new issue or PR, doing with with Cloud Functions means you don't need to maintain a full system just to run a small bit of code on these events. They can also be used kind of like a web cron, so that you don't need a full vm running if there is just something you want to fire off once a week to do 30 seconds of work.

Github Helpers

I wrote a few example uses of this for my open source work. Because my default mode for writing source code is open source, I have quite a few open source repositories on Github. They are all under very low levels of maintenance. That's a thing I know, but others don't. So instead of having PR requests just sit in the void for a month I thought it would be nice to auto respond to folks (especially new folks) the state of the world.

#
#
# main() will be invoked when you Run This Action
#
# @param Cloud Functions actions accept a single parameter, which must be a JSON object.
#
# @return The output of this action, which must be a JSON object.
#
#

import github
from openwhisk import openwhisk as ow

def thank_you(params):
    p = ow.params_from_pkg(params["github_creds"])
    g = github.Github(p["accessToken"], per_page=100)

    issue = str(params["issue"]["number"])

    repo = g.get_repo(params["repository"]["full_name"])
    name = params["sender"]["login"]
    user_issues = repo.get_issues(creator=name)
    num_issues = len(list(user_issues))

    issue = repo.get_issue(params["issue"]["number"])

    if num_issues < 3:
        comment = """
I really appreciate finding out how people are using this software in
the wide world, and people taking the time to report issues when they
find them.
I only get a chance to work on this project on the weekends, so please
be patient as it takes time to get around to looking into the issues
in depth.
"""
    else:
        comment = """
Thanks very much for reporting an issue. Always excited to see
returning contributors with %d issues created . This is a spare time
project so I only tend to get around to things on the weekends. Please
be patient for me getting a chance to look into this.
""" % num_issues

    issue.create_comment(comment)

def main(params):
    action = params["action"]
    issue = str(params["issue"]["number"])
    if action == "opened":
        thank_you(params)
        return { 'message': 'Success' }
    return { 'message': 'Skipped invocation for %s' % action }

Pretty basic, it responses back within a second or two of folks posting to an issue telling them what's up. While you can do a light weight version of this with templates in github native, using a cloud functions platform lets you be more specific to individuals based on their previous contribution rates. You can also see how you might extend it to do different things based on the content of the PR itself.

Using a Custom Docker Image

IBM's Cloud Functions provides a set of docker images for different programming languages (Javascript, Java, Go, Python2, Python3). In my case I needed more content then was available in the Python3 base image. The entire system runs on Docker images, so extending those is straight forward. Here is the Dockerfile I used to do that:

# Dockerfile for example whisk docker action
FROM openwhisk/python3action

# add package build dependencies
RUN apk add --no-cache git

RUN pip install pygithub

RUN pip install git+git://github.com/sdague/python-openwhisk.git

This builds with the base, and installs 2 additional python libraries: pygithub to make github api access (especially paging) easier, and a utility library I put up on github to keep from repeating code to interact with the openwhisk environment. When you create your actions in Cloud Functions, you just have to specify the docker image instead of language environment.

Weekly Emails

My spare time open source work mostly ends up falling between the hours of 6 - 8am on Saturdays and Sundays, which I'm awake before the rest of the family. One of the biggest problems is figuring out what I should look at then, because if I spend and hour figuring that out, then there isn't much time to do much that requires code. So I set up 2 weekly emails to myself using Cloud Functions. The first email looks at all the projects I own, and provides a list of all the open issues & PRs for them. These are issues coming in from other folks, that I should probably respond to, or make some progress on. Even just tackling one a week would get me to a zero issue space by the middle of spring. That's one of my 2018 goals. The second does a keyword search on Home Assistant's issue tracker for components I wrote, or that I run in my house that I'm pretty familiar with. Those are issues that I can probably meaningfully contribute to. Home Assistant is a big enough project now, that as a part time contributor, finding a narrower slice is important to getting anything done. Those show up at 5am in my Inbox on Saturday, so it will be the top of my email when I wake up, and a good reminder to have a look.

The Unknown Unknowns

This had been my first dive down the function as a service rabbit hole, and it was a very educational one. The biggest challenge I had was getting into a workflow of iterative development. The execution environment here is pretty specialized, including a bunch of environmental setup. I did not realize how truly valuable a robust Web IDE and detailed log server is in these environments. Being someone that would typically just run a vm and put some code under cron, or run a daemon, you get to keep all your normal tools. But the trade off of getting rid of a server that you need to keep patched is worth it some times. I think that as we see a lot of new entrants into the function-as-a-service space, that is going to be what makes or breaks them: how good their tooling is for interactive debug and iterative development.

Replicate and Extend

I've got a pretty detailed write up in the README for how all this works, and how you would replicate this yourself. Pull requests are welcomed, and discussions of related things you might be doing are as well. This is code that I'll continue to run to make my github experience better. The pricing on IBM's Cloud Functions means that this kind of basic usage works fine at the free tier.

January 9, 2018 · Technology

Slow AI

Charlie Stross's keynote at the 34th Chaos Communications Congress Leipzig is entitled "Dude, you broke the Future!" and it's an excellent, Strossian look at the future we're barelling towards, best understood by a critical examination of the past we've just gone through. Stross is very interested in what it means that today's tech billionaires are terrified of being slaughtered by psychotic runaway AIs. Like Ted Chiang and me, Stross thinks that corporations are "slow AIs" that show what happens when we build "machines" designed to optimize for one kind of growth above all moral or ethical considerations, and that these captains of industry are projecting their fears of the businesses they nominally command onto the computers around them.

Charlie Stross's CCC talk: the future of psychotic AIs can be read in today's sociopathic corporations

The talk is an hour long, and really worth watching the whole thing. I especially loved the setup explaining the process of writing believable near term science fiction. Until recently, 90% of everything that would exist in 10 years already did exist, the next 9% you could extrapolate from physical laws, and only really 1% was stuff you couldn't image. (Stross makes the point that the current ratios are more like 80 / 15 / 5, as evidenced by brexit and related upheavals, which makes his work harder). It matches well with Clay Shirky's premise in Here Comes Everyone, that first goal of a formal organization is future existence, even if it's stated first goal is something else.

December 13, 2017 · Technology

Syncing Sieve Rules in Fastmail, the hard way

I've been hosting my email over at Fastmail for years, and for the most part the service is great. The company understands privacy, contributes back to open source, and is incredibly reliable. One of the main reasons I moved off of gmail was their mail filtering system was not fine grained enough to deal with my email stream (especially open source project emails). Fastmail supports sieve, which lets you write quite complex filtering rules. There was only one problem, syncing those rules. My sieve rules are currently just north of 700 lines. Anything that complex is something that I like to manage in git, so that if I mess something up, it's easy to revert to known good state.

No API for Sieve

Fastmail does not support any kind of API for syncing Sieve rules. There is an official standard for this, called MANAGESIEVE, but the technology stack Fastmail uses doesn't support it. I've filed tickets over the years that mostly got filed away as future features. When I first joined Fastmail, their website was entirely classic html forms. Being no slouch, I had a python mechanize script that would log in as me, then navigate to the upload form, and submit it. This worked well for years. I had a workflow where I'd make a sieve change, sync via script, see that it generated no errors, then commit. I have 77 commits to my sieve rules repository going back to 2013. But, a couple of years ago the Fastmail team refreshed their user interface to a Javascript based UI (called Overture). It's a much nicer UI, but it means it only works with a javascript enabled browser. Getting to the form box where I can upload my sieve rules is about 6 clicks. I stopped really tweaking the rules regularly because of the friction of updating them through clear / copy / paste.

Using Selenium for unintended purposes

Selenium is pretty amazing web test tool. It gives you an API to drive a web browser remotely. With recent versions of Chrome, there is even a headless chrome driver, so you can do this without popping up a graphics window. You can drive this all from python (or your language of choice). An off hand comment by Nibz about using Selenium for something no one intended got me thinking: could I manage to get this to do my synchronization? Answer, yes. Also, this is one of the goofiest bits of code that I've ever written.

#!/usr/bin/env python3

import configparser
import os
import sys

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

config = configparser.ConfigParser()
config.read("config.ini")

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path=os.path.abspath("/usr/local/bin/chromedriver"),
                          chrome_options=chrome_options)

driver.get("https://fastmail.fm")

timeout = 120
try:
    element_present = EC.presence_of_element_located((By.NAME, 'username'))
    WebDriverWait(driver, timeout).until(element_present)

    # Send login information

    user = driver.find_element_by_name("username")
    passwd = driver.find_element_by_name("password")
    user.send_keys(config["default"]["user"])
    passwd.send_keys(config["default"]["pass"])
    driver.find_element_by_class_name("v-Button").click()

    print("Logged in")

    # wait for login to complete
    element_present = EC.presence_of_element_located((By.CLASS_NAME, 'v-MainNavToolbar'))
    WebDriverWait(driver, timeout).until(element_present)

    # click settings menu to make elements visible
    driver.find_element_by_class_name("v-MainNavToolbar").click()

    # And follow to settings page
    driver.find_element_by_link_text("Settings").click()

    # Wait for settings page to render, oh Javascript
    element_present = EC.presence_of_element_located((By.LINK_TEXT, 'Rules'))
    WebDriverWait(driver, timeout).until(element_present)

    # Click on Rules link
    driver.find_element_by_link_text("Rules").click()

    # Click on edit custom sieve code
    element_present = EC.presence_of_element_located((By.LINK_TEXT, 'Edit custom sieve code'))
    WebDriverWait(driver, timeout).until(element_present)
    driver.find_element_by_link_text("Edit custom sieve code").click()

    print("Editing")

    # This is super unstable, I hate that we have to go by webid
    element_present = EC.presence_of_element_located((By.CLASS_NAME, 'v-EditSieve-rules'))
    WebDriverWait(driver, timeout).until(element_present)

    print("Find form")
    elements = driver.find_elements_by_css_selector("textarea.v-Text-input")
    element = elements[-1]

    # Find the submit button
    elements = driver.find_elements_by_css_selector("button")
    for e in elements:
        if "Save" in e.text:
            submit = e

    print("Found form")
    # And replace the contents
    element.clear()

    with open("rules.txt") as f:
        element.send_keys(f.read())

    # This is the Save button
    print("Submitted!")
    submit.click()

except TimeoutException as e:
    print(e)
    print("Timed out waiting for page to load")
    sys.exit(0)

print("Done!")

Basic Flow

I won't do a line by line explanation, but there are a few concepts that make the whole thing fall in line. The first is the use of WebDriverWait. This is an OvertureJS application, which means that clicking parts of the screen trigger an ajax interaction, and it may be some time before the screen "repaints". This could be a new page, a change to the existing page, an element becoming visible. Find a thing, click a thing, wait for the next thing. There is a 5 click interaction before I get to the sieve edit form, then a save button click to finish it off. Finding things is important, and sometimes hard. Being an OvertureJS application, div ids are pretty much useless. So I stared a lot in Chrome inspector at what looked like stable classes to find the right things to click on. All of those could change with new versions of the UI, so this is fragile at best. Some times you just have to count, like finding the last textarea on the Rules page. Some times you have to inspect elements, like looking through all the buttons on a page to find the one that says "Save". Filling out forms is done with sendKeys, which approximates typing by sending 1 character every few milliseconds. If you run non headless it makes for amusing animation. My sieve file is close to 20,000 characters, so this takes more than a full minute to put that content in one character at a time. But at least it's a machine, so no typos.

The Good and the Bad

The good thing is this all seems to work, pretty reliably. I've been running it for the last week and all my changes are getting saved correctly. The bad things are you can't have 2 factor enabled and use this, because unlike things like IMAP where you can provision an App password for Fastmail, this is really logging in and pretending to be you clicking through the website and typing. There are no limited users for that. It's also slow. A full run takes It's definitely fragile, I'm sure an update to their site is going to break it. And then I'll be in Chrome inspector again to figure out how to make this work. But, on the upside, this let me learn a more general purpose set of tools for crawling and automating the modern web (which requires javascript). I've used this technique for a few sites now, and it's a good technique to add to your bag of tricks.

The Future

Right now this script is in the same repo as my rules. This also requires setting up the selenium environment and headless chrome, which I've not really documented. I will take some time to split this out on github so others could use it. I would love it if Fastmail would support MANAGESIEVE, or have an HTTP API to fetch / store sieve rules. Anything where I could use a limited app user instead of my full user. I really want to delete this code and never speak of it again, but a couple of years and closed support tickets later, and this is the best I've got. If you know someone in Fastmail engineering and can ask them about having a supported path to programatically update sieve rules, that would be wonderful. I know a number of software developers that have considered the switch to Fastmail, but stopped when the discovered that updating sieve can only be done in the webui. Updated (12/15/2017): via Twitter the Fastmail team corrected me that it's not Angular, but their own JS toolkit called OvertureJS. The article has been corrected to reflect that.

December 5, 2017 · Technology

Getting Chevy Bolt Charge Data with Python

Filed under: kind of insane code, be careful about doing this at home. Recently we went electric, and got a Chevy Bolt to replace our 12 year old Toyota Prius (who has and continues to be a workhorse). I had a spot in line for a Tesla Model 3, but due to many factors, we decided to go test drive and ultimately purchase the Bolt. It's a week in and so far so good. One of the things GM does far worse than Tesla, is make its data available to owners. There is quite a lot of telemetry captured by the Bolt, through OnStar, which you can see by logging into their website or app. But, no API (or at least no clear path to get access to the API). However, it's the 21st century. That means we can do ridiculous things with software, like use python to start a full web browser, log into their web application, and scrape out data..... so I did that.

The Code

#!/usr/bin/env python

import configparser
import os

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

config = configparser.ConfigParser()
config.read("config.ini")

chrome_options = Options()
# chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path=os.path.abspath("/usr/local/bin/chromedriver"),
                          chrome_options=chrome_options)

driver.get("https://my.chevrolet.com/login")

user = driver.find_element_by_id("Login_Username")
passwd = driver.find_element_by_id("Login_Password")
user.send_keys(config["default"]["user"])
passwd.send_keys(config["default"]["passwd"])
driver.find_element_by_id("Login_Button").click()

timeout = 120
try:
    element_present = EC.presence_of_element_located((By.CLASS_NAME, 'status-box'))
    WebDriverWait(driver, timeout).until(element_present)
    print(driver.find_element_by_class_name("status-box").text)
    print(driver.find_element_by_class_name("status-right").text)
except TimeoutException:
    print("Timed out waiting for page to load")

print("Done!")

This uses selenium, which is a tool used to test websites automatically. To get started you have to install selenium python drivers, as well as the chrome web driver. I'll leave those as an exercise to the reader. After that, the process looks a little like one might expect. Start with the login screen, find the fields for user/password, send_keys (which literally acts like typing), and submit. The My Chevrolet site is an Angular JS site, which seems to have no stateful caching of the telemetry data for the car. Instead, once you log in you are presented with an overview of your car, and it makes an async call through the OnStar network back to your car to get its data. That includes charge level, charge state, estimated range. The OnStar network is a CDMA network, proprietary protocol, and ends up taking at least 60 seconds to return that call. This means that you can't just pull data out of the page once you've logged in, because the data isn't there, there is a spinner instead. Selenium provides you a WebDriverWait class for that, which will wait until an element shows up in the DOM. We can just wait for the status-box to arrive. Then dump its text. The output from this script looks like this:

Current
Charge:
100%
Plugged in(120V)
Your battery is fully charged.
Estimated Electric Range:
203 Miles
Estimated Total Range:
203 Miles
Charge Mode:
Immediate
Change Mode
Done!

Which was enough for what I was hoping to return.

The Future

Honestly, I really didn't want to write any of this code. I really would rather get access to the GM API and do this the right way. Ideally I'd really like to make the Chevy Bolt in Home Assistant as easy as using a Tesla. With chrome inspector, I can see that the inner call is actually returning a very nice json structure back to the angular app. I've sent an email to the GM developer program to try to get real access, thus far, black hole. Lots of Caveats on this code. That OnStar link and the My Chevrolet site are sometimes flakey, don't know why, so running something like this on a busy loop probably is not a thing you want to do. For about 2 hours last night I just got "there is no OnStar account associated with this vehicle", which then magically went away. I'd honestly probably not run it more than hourly. I made no claims about the integrity of things like this. Once you see the thing working, it can be run headless by uncommenting line 18. Then it could be run on any Linux system, even one without graphics. Again, this is one of the more rediculous pieces of code I've ever written. It is definitely a "currently seems to work for me" state, and don't expect it be robust. I make no claims about whether or not it might damage anything in the process, though if logging into a website damages your car, GM has bigger issues.

June 12, 2017 · Technology

Comparing Speech Recognition for Transcripts

I listen to a lot of podcasts. Often months later something about one I listened to really strikes a chord, enough that I want to share it with others through Facebook or my blog. I'd like to quote the relevant section, but also link to about where it was in the audio. Listening back through one or more hours of podcast just to find the right 60 seconds and transcribe them is enough extra work that I often just don't share. But now that I've got access to the Watson Speech to Text service I decided to try to find out how effectively I could use software to solve this. And, just to get a sense of the world, compare the Watson engine with Google and CMU Sphinx.

Input Data

The input in question was a lecture from the Commonwealth Club of California - Zip Code, not Genetic Code: The California Endowment's 10 year, $1 Billion Initiative. There was a really interesting bit in there about spending and outcome comparisons between different countries that I wanted to quote. The Commonwealth Club makes all these files available as mp3, which none of the speech engines handle. Watson and Google both can do FLAC, and Sphinx needs a wav file. Also it appears that all speech models are trained around the assumption of a 16kHz sampling, so I needed to down sample the mp3 file and convert it. Fortunately, ffmpeg to the rescue.

ffmpeg -i cc_20170323_Zip_Code_Not_Genetic_Code_Podcast.mp3 -ar 16000 podcast.wav
ffmpeg -i cc_20170323_Zip_Code_Not_Genetic_Code_Podcast.mp3 -ar 16000 podcast.flac

Watson

The Watson Speech to Text API can either work over websocket streaming or with bulk HTTP. While I had some python code to use the websocket streaming for live transcription, I was consistently getting SSL errors after 30 - 90 seconds. A bit of googling hints that this might actually be bugs on the python side. So I reverted back to the bulk HTTP upload interface using example code from the watson-developer-cloud python package. This script I used to do it is up on github. The first 1000 minutes of transcription are free, so this is something you could reasonably do pretty regularly. After that it is$0.02 / minute for translation. When doing this over the bulk interface things are just going to seem to have "hung" for about 30 minutes, but it will eventually return data. Watson seems like it's operating no faster than 2x real time for processing audio data. The bulk processing time surprised me, but then I realized that with the general focus on real time processing most speech recognition systems just need to be faster than real time, and optimizing past that has very diminishing returns, especially if there is an accuracy trade off in the process. The returned raw data is highly verbose, and has the advantages of having timestamps per word, which makes finding passages in the audio really convenient.

          ...
          "confidence": 0.947, 
          "transcript": "and it joined the endowment in October of two thousand nine prior to his appointment at the endowment doctor right decirte since two thousand three as both the director and county health officer for the Alameda county public health department and in that role he oversaw the creation of an innovative public health practice designed to eliminate health disparities by tackling the root causes of poor health that limit quality of life and lifespan as a primary care physician for the San Francisco department of public health ", 
          "timestamps": [
            [
              "and", 
              27.26, 
              27.61
            ], 
            [
              "it", 
              27.66, 
              27.88
            ],
          ...

So 30 minutes in I had my answer.

Google

I was curious to also see what the Google experience was like, which I originally did through their API console quite nicely. Google is clearly more focused on short bits of audio. There are 3 interfaces: sync, async, and streaming. Only async allows for greater than 60 seconds of audio. In the async model you have to upload your content to Google Storage first, then reference it as a gs:// url. That's all fine, and the Google storage interface is stable and well documented, but it is an extra step in the process. Especially for content I'm only going to have to care about once. Things did get a little tricky translating my console experience to python... 3 different examples listed in the official documentation (and code comments) were wrong. The official SDK no longer seems to implement long_running_recognize on anything except the grpc interface. And the google auth system doesn't play great with python virtualenvs, because it's python code that needs a custom path, but it's not packaged on pypi. So you need to venv, then manually add more paths to your env, then gauth login. It's all doable, but it definitely felt clunky. I did eventually work through all of these, and have a working example up on github. The returned format looks pretty similar to the Watson structure (there are only so many ways to skin this cat), though a lot more compact, as there isn't per word confidence levels or per word timings.

    {
      "alternatives": [
        {
          "confidence": 0.9615234732627869, 
          "transcript": "greetings and welcome to today's meeting of the Commonwealth Club of California I'm Patty James vice-chair of the club's health and Medicine member that form and chair of this program and now it's my pleasure to introduce dr. Anthony iton MD JD and MPH which is a masters of Public Health I have to admit I had to look it up senior vice president of Healthy Communities joined the endowment in October of 2009 prior to his appointment at the endowment dr. right this Earth since 2003 as both the director and County Health officer for the Alameda County Public Health Department and in that role he oversaw the creation of an Innovative Public Health practice designed to eliminate Health disparities by tackling the root causes a poor health that limit quality of life and life span as a primary care physician for the San Francisco Department of Public Health dr. writing career includes past Service as a staff attorney"
        }
      ]
    },

For my particular problem that makes Google less useful, because the best I can do is dump all the text to the file, search for my phrase, see that it's 44% of the way through the file, and jump to around there in the audio. It's all doable, just not quite as nice.

CMU Sphinx

Being on Linux it made sense to try out CMU Sphinx as well, which took some googling on how to do it.

sudo apt install pocketsphinx pocketsphinx-en-us

Then run it with the following:

pocketsphinx_continuous -dict /usr/share/pocketsphinx/model/en-us/cmudict-en-us.dict -lm /usr/share/pocketsphinx/model/en-us/en-us.lm.bin -infile podcast.wav 2> voice.log | tee sphinx-transcript.log

Sphinx prints out a ton of debug stream on stderr, which you want to get out of the way, then the transcription should be sent to a file. Like with Watson, it's really going only a bit faster than real time, so this is going to take a minute.

Converting JSON to snippets

To try to compare results I needed to start with comparable formats. I had 2 JSON blobs, and one giant text dump. A little jq magic can extract all the text:

cat watson-transcript.json | jq '.["results"][]["alternatives"][0]["transcript"]' | sed 's/"//g'
cat google-transcript.json | jq '.["results"][]["alternatives"][0]["transcript"]' | sed 's/"//g'

Comparison: Watson vs. Google

For the purpose of comparisons, I dug out the chunk that I was expecting to quote, which shows up about half way through the podcast, at second 1494.98 (24:54.98) according to Watson. The best way I could think to compare all of these is start / end at the same place, word wrap the texts, and then use wdiff to compare them. Here is watson (-) vs. google (+) for this passage:

one of the things that they [-it you've-] probably all [-seen all-] {+seem you'll+} know that [-we're the big spenders-] {+where The Big Spenders+} on [-health care-] {+Healthcare+} so this is per capita spending of [-so called OECD-] {+so-called oecd+} countries developed countries around the world and whenever you put [-U. S.-] {+us+} on the graphic with everybody else you have to change the [-axis-] {+access+} to fit the [-U. S.-] {+US+} on with everybody else [-because-] {+cuz+} we spend twice as much as {+he always see+} the [-OECD-] average [-and-] {+on+} the basis on [-health care-] {+Healthcare+} the result of all that spending we don't get a lot of bang for our [-Buck-] {+buck+} we should be up here [-we're-] {+or+} down there [-%HESITATION-] so we don't get a lot [-health-] {+of Health+} for all the money that we're spending we all know that that's most of us know that [-I'm-] it's fairly well [-known-] {+know+} what's not as [-well known-] {+well-known+} is this these are two women [-when Cologne take-] {+one killoran+} the other one Elizabeth Bradley at Yale and Harvard respectively who actually [-our health services-] {+are Health Services+} researchers who did an analysis [-it-] {+that+} took the per capita spending on health care which is in the blue look at [-all OECD-] {+Alloa CD+} countries but then added to that per capita spending on social services and social benefits and what they found is that when you do that [-the U. S.-] {+to us+} is no longer the big [-Spender were-] {+spender or+} actually kind of smack dab in the middle of the pack what they also found is that spending on social services and benefits [-gets you better health-] {+Gets You Better Health+} so we literally have the accent on the wrong syllable and that red spending is our social [-country-] {+contract+} so they found that in [-OECD-] {+OCD+} countries every [-two dollars-] {+$2+} spent on [-social services-] {+Social Services+} as [-opposed to dollars-] {+a post $2+} to [-one-] {+1+} ratio [-in social service-] {+and Social Service+} spending to [-health-] {+help+} spending is the recipe for [-better health-] {+Better Health+} outcomes [-US-] {+us+} ratio [-is fifty five cents-] {+was $0.55+} for every dollar [-it helps me-] {+of houseman+} so this is we know this if you want better health don't spend it on [-healthcare-] {+Healthcare+} spend it on prevention spend it on those things that anticipate people's needs and provide them the platform that they need to be able to pursue [-opportunities-] {+opportunity+} the whole world is telling us that [-yet-] {+yeah+} we're having the current debate that we're having right at this moment in this country about [-healthcare-] {+Healthcare there's+} something wrong with our critical thinking [-so-] {+skills+}

Both are pretty good. Watson feels a little more on target, with getting axis/access right, and being more consistent on understanding when U.S. is supposed to be a proper noun. When Google decides to capitalize things seems pretty random, though that's really minor. From a content perspective both were good enough. But as I said previously, the per word timestamps on Watson still made it the winner for me.

Comparison: Watson vs Sphinx

When I first tried to read the Sphinx transcript it felt so scrambled that I wasn't even going to bother with it. However, using wdiff was a bit enlightening:

one of the things that they [-it you've-] {+found that you+} probably all seen [-all-] {+don't+} know that [-we're the-] {+with a+} big spenders on health care [-so this is-] {+services+} per capita spending of so called [-OECD countries-] {+all we see the country's+} developed countries {+were+} around the world and whenever you put [-U. S.-] {+us+} on the graphic with everybody else [-you have-] {+get back+} to change the [-axis-] {+access+} to fit the [-U. S.-] {+u. s.+} on [-with everybody else because-] {+the third best as+} we spend twice as much as {+you would see+} the [-OECD-] average [-and-] the basis on health care the result of all [-that spending-] {+let spinning+} we don't [-get-] {+have+} a lot of bang for [-our Buck-] {+but+} we should be up here [-we're-] {+were+} down [-there %HESITATION-] {+and+} so we don't [-get a lot-] {+allow+} health [-for all the-] {+problem+} money that we're spending we all know that that's {+the+} most [-of us know that I'm-] {+was the bum+} it's fairly well known what's not as well known is this these [-are-] {+were+} two women [-when Cologne take-] {+one call wanted+} the other one [-Elizabeth Bradley-] {+was with that way+} at [-Yale-] {+yale+} and [-Harvard respectively who actually our health-] {+harvard perspective we whack sheer hell+} services researchers who did an analysis it took the per capita spending on health care which is in the blue look at all [-OECD-] {+always see the+} countries [-but then-] {+that it+} added to that [-per capita-] {+for capital+} spending on social services [-and-] {+as+} social benefits and what they found is that when you do that the [-U. S.-] {+u. s.+} is no longer the big [-Spender-] {+spender+} were actually kind of smack dab in the middle [-of-] the [-pack-] {+pact+} what they also found is that spending on social services and benefits [-gets-] {+did+} you better health so we literally [-have the-] {+heavy+} accent on the wrong [-syllable-] {+so wobble+} and that red spending is our social [-country-] {+contract+} so they found that [-in OECD countries-] {+can only see the country's+} every two dollars spent on social services as opposed to [-dollars to one ratio in-] {+know someone shone+} social service [-spending to-] {+bennington+} health spending is the recipe for better health outcomes [-US ratio is-] {+u. s. ray shows+} fifty five cents for every dollar [-it helps me-] {+houseman+} so this is we know this if you want better health don't spend [-it-] on [-healthcare spend it-] {+health care spending+} on prevention [-spend it-] {+expanded+} on those things that anticipate people's needs and provide them the platform that they need to be able to pursue [-opportunities-] {+opportunity+} the whole world is [-telling us that-] {+telecast and+} yet we're having [-the current debate that-] {+a good they did+} we're having right at this moment in this country [-about healthcare-] {+but doctor there's+} something wrong with our critical thinking [-so-] {+skills+}

There was an pretty interesting Blog post a few months back comparing similar Speech to Text services. His analysis used raw misses to judge accuracy. While that's a very objective measure, language isn't binary. Language is the lossy compression of a set of thoughts/words/shapes/smells/pictures in our mind over a shared medium audio channel and attempted to be reconstructed in real time in another mind. As such language, and especially conversation, has checksums and redundancies. The effort required to understand something isn't just about how many words are wrong, but what words they were, and what the alternative was. Axis vs. access, you could probably have figured out. "Spending to" vs. "bennington", takes a lot more mental energy to work out, maybe you can reverse it. "Harvard respectively who actually our health" (which isn't even quite right) vs. "harvard perspective we whack sheer hell" is so far off the deep end you aren't ever getting back. So while its mathematical accuracy might not be much worse, the rabbit holes it takes you down pretty much scramble things beyond the point of no return. Which is unfortunate, as it would be great if there was an open solution in this space. But it does get to the point that for good speech to text you not only need good algorithms, but tons of training data.

Playing with this more

I encapsulated all the code I used for this in a github project, some of it nicer than others. When it gets to signing up for accounts and setting up auth I'm pretty hand wavy, because there is enough documentation on those sites to do it. Given the word level confidence and timestamps, I'm probably going to build something that makes an HTML transcript that's marked up reasonably with those. I do wonder if it would be easier to read if you knew which words it was mumbling through. I was actually a little surprised that Google doesn't expose that part of their API, as I remember the Google Voice UI exposing per word confidence levels graphically in the past. I'd also love to know if there were ways to get Sphinx working a little better. As an open source guy, I'd love for there to be a good offline and open solution to this problem as well. This is an ongoing exploration, so if you have any follow on thoughts or questions, please leave a comment. I would love to know better ways to do any of this.

December 23, 2016 · Technology

The real Trolley Problem in tech

With all the talk about autonomous cars in general media this year, we all got a refresher in Ethics 101 and the Trolley Problem. The Trolley Problem is where you as an onlooker see a trolley barreling towards 5 people. There is a switch you can throw where it will kill 1 previously safe person instead of 5. What do you do? Do you take part in an act which kills 1 person while saving 5? Do you refuse to take part in an act of violence, but then willing let 5 people die because of it? No right answers, just a theoretical problem to think through and see all the trade offs. But as fun as this all is, Autonomous cars are not the big trolley problem in Tech. Organizing and promoting information is. Right now, if you put the phase "Was the holocaust real?" into Google, you'll get back 10 results. 8 will be various websites and articles that make the case that it was not real, but a giant hoax. The second hit (not the first) is the Wikipedia article on Holocaust Denial, and a link further down from the United States Holocaust Memorial talking about all the evidence presented at Nuremburg. 8 out of 10. The argument we get in Tech a lot is that because results are generated by an algorithm, they are neutral. An algorithm is just a set of instructions a human built once upon a time. When it was being built or refined some human looked at a small number of inputs, what it output, and made a judgement call that it was good. Then they fed it a lot more input, far more than any human could digest, and let it loose on the world. Under the assumption that the testing input was representative enough that it would produce valid results for all input. 8 out of 10. Why are those the results? Because Google came up with an insight years ago that webpages have links, people produce webpages, and important sites with authoritative information get linked to quite often. In the world before Google, this was hugely true, because once you found a gem on the internet, if you didn't write it down somewhere, finding it again later was often impossible. In the 20 years since Google, and in the growth of the internet, that's less true. It's also less true about basic understood facts. There aren't thousands of people writing essays about the holocaust anymore. There are, however, a fringe of folks trying to actively erase that part of history. Why? I honestly have no idea. I really can't even get into that head space. But it's not unique to this event. There are people who write that the Sandy Hook shooting was a hoax too, and harass the families who lost children during that event. 8 out of 10. Why does this matter? Who looks these things up? Maybe it's just the overly radicalized who already believe it? Or maybe it's the 12 year old kid who is told something on the playground and comes home to ask Google the answer. And finds 8 out 10 results say it's a hoax. What could be done? Without Google intervening people could start writing more content on the internet saying the Holocaust was real, and eventually Google might interpret that and shift results. Maybe only 6 out of 10 for the hoax. Could we get enough popular sites so the truth could even be a majority, and the hoax would only get 4 out of 10? How many person hours, websites, twitter posts do we need to restore this truth? As we're sitting on Godwin's front lawn already, lets talk about the problem with that. 6 million voices (and all the ones that came after them) that would have stood up here, are gone. The side of truth literally lost a generation to this terrible event. So the other answer is that Google really should fix this. They control the black box. They already down rank other sites for malware, less accessible content, soon for popup windows. The algorithm isn't static, it keeps being trained. And the answer you get is: "that's a slippery slope. When humans start interfering with search results that could be used by the powerful to suppress ideas they don't like." It is a slippery slope. But it assumes you aren't already on that slope. 8 out of 10. That act of taking billions of documents and choosing 10 to display, is an act of amplification. What gets amplified is the Trolley problem. Do we just amplify the loudest voices? Or do we realize that the loudest voices can use that platform to silence others? We've seen this already. We've seen important voices online that were expressing nothing more than "women are equal, ok?" get brutally silenced through Doxing and other means. Some people that are really invested in keeping truth of the table don't stop at making their case, they actively attack and harass and send death threats to their opponents. So now that field of discourse keeps tilting towards the ideologues. 8 out of 10. This is our real Trolley problem in Tech. Do you look at the playing field, realize that it's not level, the ideologues are willing to do far more and actually knock their opponents off the internet entirely, and do something about it? Or do you, through inaction, just continue to amplify those loud voices. And the playing field tips further. Do we realize that this is a contributing factor as to why our companies are so much less diverse than society at large? Do we also realize that lack of diversity is why this doesn't look like a problem internally? At least not to 8 out of 10 folks. 8 out of 10. I don't have any illusions this is going to change soon. The Google engine is complicated. Hard problems are hard. But we live in this world, both real and digital. Every action we do has an impact. In an increasingly digital world, the digital impact matters as much as, or even more than the real world. Those of us who have a hand in shaping what that digital world looks like need to realize how great a responsibility that has become. Like the Trolley Problem, there is no right answer. But 8 out of 10 seems like the wrong answer.

August 16, 2016 · Technology

Visualizing Olympic 100m over time

Every Olympic medalist in the 100-meter sprint – on the same track.

Source: Usain Bolt and the Fastest Men in the World Since 1896 – on the Same Track Lots of fun with visualization. NYTimes puts all the medal winners of the modern olympics in 100m on one track, calibrated to the Olympic Record.

March 28, 2016 · Personal

A record player in a car, what could go wrong

What’s the connection between the Beatles’ George Harrison, boxing legend Muhammad Ali, and Chrysler cars? The Highway Hi-Fi: a vinyl record player that just happened to be the world’s first in-car music system. It appeared 60 years ago this spring, in 1956, and should have been a smash hit. It was innovatory, a major talking point, arrived as the car market was booming as never before, and it came with much press hype. It also had the backing of a leading motor manufacturer. What could possibly go wrong?

Source: Forgotten audio formats: The Highway Hi-Fi | Ars Technica It's a fascinating story, made even more so because basically proprietary formats and copyright tangles killed it so quickly.

June 9, 2015 · Technology

Python Design Patterns

[embed]https://www.youtube.com/watch?v=Er5K_nR5lDQ&feature=youtu.be&t=17m25s[/embed] A friend pointed me to this talk by Brandon Rhodes on python design patterns from PyOhio a couple of years ago. The talk asks an interesting question: why aren't design patterns seen and talked about in the Python community. He walks through the patterns in Design Patterns: Elements of Reusable Object-Oriented Software one by one, and points out some that are features of the language, some that are used in the standard library, and some that are really applicable. All with some nice small code examples. The thing that got me thinking though was a comment he makes both at the beginning and end of the talk. The reason you don't see these patterns in Python is because Python developers tend not to write the kind of software where they are needed. They focus on small tools that connect other components, or live within a framework. I'm a newcomer to the community, been doing Python full time for only a few years on OpenStack. So I can't be sure whether or not it's true. However, I know there are times when I'm surprised by things that I would have expected to be solved already in the language, or incompatibilities that didn't need to be there in the python 2 to 3 transition, and wonder if these come from this community not having a ton of experience with software at large code base size, as well as long duration code bases, and the kinds of deprecation and upgrade guarantees needed there.