Writings

November 21, 2017 · Technology · astronomy

It came from beyond the stars

Meet 1I/2017 U1 (‘Oumuamua), but make it fast, because it’s already leaving.

Source: First-known interstellar visitor is a bizarre, cigar-shaped asteroid

This is one of the most exciting news stories of the year. We've now actually seen an object that had to have come from another solar system, pass through ours. Data collection by the Hubble telescope continues into December, after which point it's too faint for anything to see again.

November 5, 2017 · Technology · science · history

Migration by Sea

For decades, students were taught that the first people in the Americas were a group called the Clovis who walked over the Bering land bridge about 13,500 years ago. They arrived (so the narrative goes) via an ice-free corridor between glaciers in North America. But evidence has been piling up since the 1980s of human campsites in North and South America that date back much earlier than 13,500 years. At sites ranging from Oregon in the US to Monte Verde in Chile, evidence of human habitation goes back as far as 18,000 years.

Source: Most scientists now reject the idea that the first Americans came by land | Ars Technica

Pretty huge change in what we were taught growing up, and a great story about how extraordinary claims require extraordinary evidence. And when that evidence is compelling, scientific consensus moves. This also provides more overlap for the mega fauna die off and human habitation, which makes sense.

October 16, 2017 · Personal · personal · hudson-valley

Catskills Conf 2017

Yawp! YAWP! Yawp! YAWP! Don't fall in the creek. Hudson Valley Tech Ashokan Community. Yawp! YAWP! Yawp! YAWP! Don't fall in the creek. Be as open and present as you can be.

That was the chorus of the theme song for Castkills Conf this weekend. Yes there was a theme song. Every day started with a musical riff on the talks of the day before by Jonathan Mann, who has been posting a song-a-day, every day, to youtube for a decade. You can go watch them for Friday, Saturday, and Sunday (soon). This is one of the many wonderful ways that this event was unlike any tech event I've been to, and why it just became one of my favorite tech events I've ever been to.

At a typical tech event the focus is on getting a bunch of speakers, so much content, splitting folks into tracks on the topics they would be interested in, then packing that in from 9 - 5 (or later). People are exhausted by the end of the day. They also largely attended different conferences. At a 10 track conference, the only shared experience, if it exists, is keynotes. Which for larger conferences are purchased slots.

Catskills Conf was a single speaking track. There were only 10 speakers, plus a lightning talk session with 7 lightning talks. The talks were a shared experience for everyone. They were all about technology, the tech industry, and/or the intersection of tech with other aspects of our lives. And they were all incredible. I considered it quite an honor to be a part of the speaking lineup.

And when it came to speakers, the Catskills Conf team was extremely serious about having a diverse speaker list. Of the 10 speakers, the gender split was 3 men, 6 women, and 1 non-binary. 4 of 10 speakers were people of color. The lightning talks were equally diverse. It was such a stark contrast to what you typically see at a Tech event that it was in your face refreshing.

10 talks doesn't seem like a lot for a 3 day conference, but in between them there were structured Activity times. Saturday afternoon there was a 2 hour activity block after lunch with options including black smithing, letter press, self defense, foraging, hiking (with historic interpretation of the Ashokan site), and bread making. Being a 75 degree sunny fall afternoon, I opted for the 2 hour hike, wandering through woods and along streams. I came straight back from that into my talk more energized than I've ever been for one.

These kind of breaks from sitting and listening to talks and doing something with your hands or feet gave was wonderful for processing what you were hearing. It also meant that by the end of the day instead of feeling like your brain was jelly, you had enough processing time that you were excited to talk about what you heard, or get to know the person sitting next to you at dinner and find out the fascinating things they were doing. Every night ended with a campfire and beer under the stars. Which was another place to talk and get to know more folks. You weren't so overloaded during the day that you wanted to go off and hide and decompress afterwards, even those of us that are more on the introvert side.

A couple of folks even collected for a 6am sunrise hike on Sunday morning. I joined 3 others as we hiked under flashlights, losing the trail a couple of times, to a sugar orchard and sugar shack, while discussing the work one of the hikers was doing around infectious disease modeling and experiments with mosquitos, the bio mimicry work that another was doing trying to take queues from nature and work them into built materials, discussing bird migrations, tech meetups, and just generally exploring a beautiful area.

This was the 3rd year of Catskills Conf, but the first time I could make it. I'm going to be processing the event for weeks to come. There were so many moments that I really loved that aren't here, it just doesn't all fit. But one thing is for sure. I'm extremely excited about attending and participating in the years to come.

I'll leave you with this really cool Catskills Conf 2017 wrap up video. It's not like being there, but it gives you a flavor.

August 2, 2017 · Technology · home-automation · open-source

OpenWest 2017 Roundup

When I first discovered the Open West conference, I was told it was the biggest US open source event that I'd never heard of, which is a pretty apt description. Open West brings together technologists interested in Open Technology in Sandy Utah, just south of Salt Lake City. This is a community regional Open Source event, run by volunteers, which means the program is much more varied than what you'd see at an event focused on a particular open source technology stack.

With up to 13 tracks happening simultaneously, there were lots of great moments for me over the course of the week. I'm just going to capture a few of them.

OpenCV Trials and Tribulations

There was a great talk by John Harrison at Lucid Charts about trying to do something interesting with OpenCV, and failing. He was giving the talk in the spirit of the Journal of Negative Results: reporting a hard problem they tried and failed at, and the dead ends they ran into.

It started as a hackfest project, could they take a screen shot with a camera of a flow chart, and use OpenCV to turn that into a symbolic flow chart in their tool. Turns out if you write all connecting lines in red, and all shapes in black, it's not a very hard problem. Also turns out, even in controlled user experiments, you can't get anyone to do that. It fails UX. And while they did build a system that worked with black lines everywhere in controlled lab environments, it worked with 0% of customer taken images, and the path to improvement wasn't clear, so after a 2 month experiment they stopped.

While they are primarily a Java shop, they did this entire project in python, because while "there are OpenCV bindings for every language you can imagine, all the interesting examples are only in python." Which goes to show how import an open and vibrant ecosystem of consuming tools is to the success of a project.

Writing Ethical Software

This was an interesting talk by James Prestwich on writing ethical software, that started with a brief history of schools of thought on ethics over the last 3000 years. The primer was just straight up informative, and the presenter actually did a quite good job being neutral through all of that.

Then we were posed with an interesting question. Software is now mostly about mediating complex interactions between people. If you look at other fields like Medicine and Law there are oaths and codes of conduct that their practitioners take because of how much their work affects people's lives.

We have collectively decided that certain things, like land mines, should not exist in the world. We have treaties on that. But as software eats the world, we're not having the conversation about what software should not exist, for any reason.

There weren't answers during this talk, it was mostly questions and attempting to start a conversation. But for anyone who works in software it's a good thought exercise to have. What are your personal ethical boundaries about software you would create or contribute to? It's also a much better conversation to have well in advance of any actual ethical conflict, because things are rarely bright lines, but long slippery slopes.

Hardware Track

There was a dedicated hardware track on the main stage for the whole conference, at least a third of the talks were related to home automation in some way, and 80% of them centered around a project that used a Raspberry Pi.

Raspberry Pi has managed to go across the entire hype curve and is now climbing away on the plateau of productivity. We went from neat idea, to unobtainium, to toy projects, to boxes full of pis in basements, to real productivity over the last 5 years. Yes, there are lots of other cheaper, neater, more powerful platforms, but the ecosystem around the pi just makes it the no brainer work horse.

I was actually a little surprised how many home grown Home Automation systems people talked about there. I did have pieces of something like that before discovering Home Assistant, but now it's hard to imagine doing all the work that the community is doing for me.

One of the projects I thought was most interesting was air quality monitoring with the esp8662. For about $30 they can build each monitoring unit, then find places throughout the community they can plug them in (need power and wifi). They are collecting it all in a

central MQTT broker and doing reports on it to try to get a better baseline on the air quality in the Salt Lake City area.

Patching People

The stand out keynote of the event was Deb Nicholson on patching people.

Any group of humans, and they ways they interact, have bugs, just like software has bugs. A people bug is like a software bug, it's unintended negative side effects of things that are happening. The point is, patching people is actually not all that different than patching software.

Filing "bugs" against people is a little harder than software, because no one likes to accept criticism. So as such, she put forward the idea of "calling in" vs. "calling out". Take the person aside, privately, and say "I think you were trying to do X, but the way it was said excluded a bunch of these people. Maybe saying it this other way would be more effective?".

The other thing to realize is none of us is above this. We all make mistakes, and need some patching from time to time.

After this talk I'm going to try to be better about calling in when I think it will help. In open source projects, they live or die by the longevity of the community, so patching the community to be more inclusive and welcoming is key.

So many more good moments...

Honestly, there were so many other good moments as well: chatting with folks about Home Assistant after my talk; seeing the state of the world on different AI cloud platforms; thinking about localization and culture in software; getting my head around the oauth model; json web tokens.

This is definitely a conference I'd love to get to again, and a great community event they've built there. Thanks to the OpenWest organizing team for such a great show.

July 19, 2017 · Technology · openstack · open-source

Triple Bottom Line in Open Source

One of the more thought provoking things that came out of the OpenStack leadership training at Zingerman's last year, was the idea of the Triple Bottom Line. It's something I continue to ponder regularly.

The Zingerman's family of businesses definitely exist to make money, there are no apologies for that. However, it's not their only bottom line that they measure against they've defined for themselves. Their full bottom line is "Great Food, Great Service, Great Finance." In practice this means you have to ensure that all are being met, and not sacrifice the food and service just to make a buck.

If you look at Open Source through this kind of lens, a lot of trade offs that successful projects make make a lot more sense. The TBL for OpenStack would probably be something like: Code, Community, Contributors. Yes, this is about building great code, to make a great cloud, but it's also really critical to grow the community, and mentor and grow individual contributors as well. Those contributors might stay in OpenStack, or they might go on to use their skills to help other Open Source projects be better in the future. All of these are measures of success.

This was one of the reasons we recently switch the development tooling in OpenStack (DevStack) to using systemd more natively. Not only did it solve a bunch of long standing technical issues, that had really ugly work arounds, but it also meant enhancing our contributors. Systemd and the journal are default in every new Linux environment now, so skills that our contributors gained working with DevStack would now directly transfer to any Linux environment. It would make them better Linux users in any context, not just OpenStack. It also makes the environment easier for people coming from the outside to understand, because it looks more like what they are used to.

While I don't have enough data to back it up, it feels like this central question is really important to success in Open Source: "In order to be successful in this project you must learn X, which will be useful in these other contexts outside of the project." X has to be small enough to be learnable, but also has to be useful in other contexts, so time invested has larger payoffs. That's what growing a contributor looks like, they don't just become better at your project, they become a better developer for everything they touch in the future.

July 11, 2017 · Technology · science

Lumosity boosts brain function by 0%

In the new controlled, randomized trial involving 128 healthy young adults, researchers found that playing Lumosity brain-training games for 30-minute sessions, five times a week for 10 weeks resulted in participants getting better at playing the games. But researchers saw no changes in participants’ neural activity and no improvements in their cognitive performance beyond those seen in controls. The same went for participants who played video games not designed with cognitive benefits in mind.

Source: Lumosity boosts brain function by 0%, the same as normal video games—study | Ars Technica UK

To the best of any studies out there, brain training games are all snake oil. There is no such thing as general intelligence booster, there is just getting better at specific skills because you do them more.

July 8, 2017 · Technology · home-automation

IoT & Home Assistant at OpenWest

I'm thrilled to be talking about the Internet of Things and Home Assistant at the OpenWest conference next week. The talk for it has come together quite nicely, and I'll hopefully be giving it a few more places over the coming year as well. The goal of the talk is to explain some of the complexity of the space, and see why it is so complex, and why the only real path forward in the short / medium term is an open source hub at the heart of everything.

For those that can't make it all the way to Utah, there is a trimmed down Article version of it up at opensource.com. The article seems to be doing well, and was #2 for this week on the site.

I will also be forever indebted to Benjamin Walker and his complete throw away line "this is why we can't have the internet of nice things" during his New York After Rent series (which is really incredible, and completely unrelated to any of this), which stuck in my brain for months afterwards, and became the seed of inspiration for this talk.

June 13, 2017 · Technology · ibm

Visualizing Watson Speech Transcripts

After comparing various speech to text engines, and staring at transcripts, I got intrigued about how much more metadata I was getting back from Watson about the speech. With both timings and confidence levels I built a little visualizer for the transcript that colors things based on confidence, and attempts to insert some punctuation: This is a talk by Neil Gaiman about how stories last at the Long Now Foundation.

Things are more red -> yellow based on how uncertain they are.

A few things I learned along the way with this. Reversing punctuation into transcriptions of speech is hard. Originally I was trying to figure out if there was some speech delay that I could guess for a comma vs. a period, and very quickly that just turned into mush. The rule I came up with which wasn't terrible is to put a comma in for 0.1 - 0.3s delays, and put one period of an elipsis in for every 0.1s delay in speech for longer pauses. That gives a sense of the dramatic pauses, and does mentally make it easier to read along.

It definitely shows how the metadata around speech to text can make human understanding of the content a lot easier. It's nice that you can get that out of Watson, and it would be great if more environments supported that.

June 12, 2017 · Technology · ibm · software · longform

Comparing Speech Recognition for Transcripts

I listen to a lot of podcasts. Often months later something about one I listened to really strikes a chord, enough that I want to share it with others through Facebook or my blog. I'd like to quote the relevant section, but also link to about where it was in the audio.

Listening back through one or more hours of podcast just to find the right 60 seconds and transcribe them is enough extra work that I often just don't share. But now that I've got access to the Watson Speech to Text service I decided to try to find out how effectively I could use software to solve this. And, just to get a sense of the world, compare the Watson engine with Google and CMU Sphinx.

Input Data

The input in question was a lecture from the Commonwealth Club of California - Zip Code, not Genetic Code: The California Endowment's 10 year, $1 Billion Initiative. There was a really interesting bit in there about spending and outcome comparisons between different countries that I wanted to quote. The Commonwealth Club makes all these files available as mp3, which none of the speech engines handle. Watson and Google both can do FLAC, and Sphinx needs a wav file. Also it appears that all speech models are trained around the assumption of a 16kHz sampling, so I needed to down sample the mp3 file and convert it. Fortunately, ffmpeg to the rescue.

ffmpeg -i cc_20170323_Zip_Code_Not_Genetic_Code_Podcast.mp3 -ar 16000 podcast.wav
ffmpeg -i cc_20170323_Zip_Code_Not_Genetic_Code_Podcast.mp3 -ar 16000 podcast.flac

Watson

The Watson Speech to Text API can either work over websocket streaming or with bulk HTTP. While I had some python code to use the websocket streaming for live transcription, I was consistently getting SSL errors after 30 - 90 seconds. A bit of googling hints that this might actually be bugs on the python side. So I reverted back to the bulk HTTP upload interface using example code from the watson-developer-cloud python package. This script I used to do it is up on github.

The first 1000 minutes of transcription are free, so this is something you could reasonably do pretty regularly. After that it is$0.02 / minute for translation.

When doing this over the bulk interface things are just going to seem to have "hung" for about 30 minutes, but it will eventually return data. Watson seems like it's operating no faster than 2x real time for processing audio data. The bulk processing time surprised me, but then I realized that with the general focus on real time processing most speech recognition systems just need to be faster than real time, and optimizing past that has very diminishing returns, especially if there is an accuracy trade off in the process.

The returned raw data is highly verbose, and has the advantages of having timestamps per word, which makes finding passages in the audio really convenient.

          ...
          "confidence": 0.947, 
          "transcript": "and it joined the endowment in October of two thousand nine prior to his appointment at the endowment doctor right decirte since two thousand three as both the director and county health officer for the Alameda county public health department and in that role he oversaw the creation of an innovative public health practice designed to eliminate health disparities by tackling the root causes of poor health that limit quality of life and lifespan as a primary care physician for the San Francisco department of public health ", 
          "timestamps": [
            [
              "and", 
              27.26, 
              27.61
            ], 
            [
              "it", 
              27.66, 
              27.88
            ],
          ...

So 30 minutes in I had my answer.

Google

I was curious to also see what the Google experience was like, which I originally did through their API console quite nicely. Google is clearly more focused on short bits of audio. There are 3 interfaces: sync, async, and streaming. Only async allows for greater than 60 seconds of audio.

In the async model you have to upload your content to Google Storage first, then reference it as a gs:// url. That's all fine, and the Google storage interface is stable and well documented, but it is an extra step in the process. Especially for content I'm only going to have to care about once.

Things did get a little tricky translating my console experience to python... 3 different examples listed in the official documentation (and code comments) were wrong. The official SDK no longer seems to implement long_running_recognize on anything except the grpc interface. And the google auth system doesn't play great with python virtualenvs, because it's python code that needs a custom path, but it's not packaged on pypi. So you need to venv, then manually add more paths to your env, then gauth login. It's all doable, but it definitely felt clunky.

I did eventually work through all of these, and have a working example up on github.

The returned format looks pretty similar to the Watson structure (there are only so many ways to skin this cat), though a lot more compact, as there isn't per word confidence levels or per word timings.

    {
      "alternatives": [
        {
          "confidence": 0.9615234732627869, 
          "transcript": "greetings and welcome to today's meeting of the Commonwealth Club of California I'm Patty James vice-chair of the club's health and Medicine member that form and chair of this program and now it's my pleasure to introduce dr. Anthony iton MD JD and MPH which is a masters of Public Health I have to admit I had to look it up senior vice president of Healthy Communities joined the endowment in October of 2009 prior to his appointment at the endowment dr. right this Earth since 2003 as both the director and County Health officer for the Alameda County Public Health Department and in that role he oversaw the creation of an Innovative Public Health practice designed to eliminate Health disparities by tackling the root causes a poor health that limit quality of life and life span as a primary care physician for the San Francisco Department of Public Health dr. writing career includes past Service as a staff attorney"
        }
      ]
    },

For my particular problem that makes Google less useful, because the best I can do is dump all the text to the file, search for my phrase, see that it's 44% of the way through the file, and jump to around there in the audio. It's all doable, just not quite as nice.

CMU Sphinx

Being on Linux it made sense to try out CMU Sphinx as well, which took some googling on how to do it.

sudo apt install pocketsphinx pocketsphinx-en-us

Then run it with the following:

pocketsphinx_continuous -dict /usr/share/pocketsphinx/model/en-us/cmudict-en-us.dict -lm /usr/share/pocketsphinx/model/en-us/en-us.lm.bin -infile podcast.wav 2> voice.log | tee sphinx-transcript.log

Sphinx prints out a ton of debug stream on stderr, which you want to get out of the way, then the transcription should be sent to a file. Like with Watson, it's really going only a bit faster than real time, so this is going to take a minute.

Converting JSON to snippets

To try to compare results I needed to start with comparable formats. I had 2 JSON blobs, and one giant text dump. A little jq magic can extract all the text:

cat watson-transcript.json | jq '.["results"][]["alternatives"][0]["transcript"]' | sed 's/"//g'
cat google-transcript.json | jq '.["results"][]["alternatives"][0]["transcript"]' | sed 's/"//g'

Comparison: Watson vs. Google

For the purpose of comparisons, I dug out the chunk that I was expecting to quote, which shows up about half way through the podcast, at second 1494.98 (24:54.98) according to Watson.

The best way I could think to compare all of these is start / end at the same place, word wrap the texts, and then use wdiff to compare them. Here is watson (-) vs. google (+) for this passage:

one of the things that they [-it you've-] probably all [-seen all-] {+seem you'll+} know that [-we're the big spenders-] {+where The Big Spenders+} on [-health care-] {+Healthcare+} so this is per capita spending of [-so called OECD-] {+so-called oecd+} countries developed countries around the world and whenever you put [-U. S.-] {+us+} on the graphic with everybody else you have to change the [-axis-] {+access+} to fit the [-U. S.-] {+US+} on with everybody else [-because-] {+cuz+} we spend twice as much as {+he always see+} the [-OECD-] average [-and-] {+on+} the basis on [-health care-] {+Healthcare+} the result of all that spending we don't get a lot of bang for our [-Buck-] {+buck+} we should be up here [-we're-] {+or+} down there [-%HESITATION-] so we don't get a lot [-health-] {+of Health+} for all the money that we're spending we all know that that's most of us know that [-I'm-] it's fairly well [-known-] {+know+} what's not as [-well known-] {+well-known+} is this these are two women [-when Cologne take-] {+one killoran+} the other one Elizabeth Bradley at Yale and Harvard respectively who actually [-our health services-] {+are Health Services+} researchers who did an analysis [-it-] {+that+} took the per capita spending on health care which is in the blue look at [-all OECD-] {+Alloa CD+} countries but then added to that per capita spending on social services and social benefits and what they found is that when you do that [-the U. S.-] {+to us+} is no longer the big [-Spender were-] {+spender or+} actually kind of smack dab in the middle of the pack what they also found is that spending on social services and benefits [-gets you better health-] {+Gets You Better Health+} so we literally have the accent on the wrong syllable and that red spending is our social [-country-] {+contract+} so they found that in [-OECD-] {+OCD+} countries every [-two dollars-] {+$2+} spent on [-social services-] {+Social Services+} as [-opposed to dollars-] {+a post $2+} to [-one-] {+1+} ratio [-in social service-] {+and Social Service+} spending to [-health-] {+help+} spending is the recipe for [-better health-] {+Better Health+} outcomes [-US-] {+us+} ratio [-is fifty five cents-] {+was $0.55+} for every dollar [-it helps me-] {+of houseman+} so this is we know this if you want better health don't spend it on [-healthcare-] {+Healthcare+} spend it on prevention spend it on those things that anticipate people's needs and provide them the platform that they need to be able to pursue [-opportunities-] {+opportunity+} the whole world is telling us that [-yet-] {+yeah+} we're having the current debate that we're having right at this moment in this country about [-healthcare-] {+Healthcare there's+} something wrong with our critical thinking [-so-] {+skills+}

Both are pretty good. Watson feels a little more on target, with getting axis/access right, and being more consistent on understanding when U.S. is supposed to be a proper noun. When Google decides to capitalize things seems pretty random, though that's really minor. From a content perspective both were good enough. But as I said previously, the per word timestamps on Watson still made it the winner for me.

Comparison: Watson vs Sphinx

When I first tried to read the Sphinx transcript it felt so scrambled that I wasn't even going to bother with it. However, using wdiff was a bit enlightening:

one of the things that they [-it you've-] {+found that you+} probably all seen [-all-] {+don't+} know that [-we're the-] {+with a+} big spenders on health care [-so this is-] {+services+} per capita spending of so called [-OECD countries-] {+all we see the country's+} developed countries {+were+} around the world and whenever you put [-U. S.-] {+us+} on the graphic with everybody else [-you have-] {+get back+} to change the [-axis-] {+access+} to fit the [-U. S.-] {+u. s.+} on [-with everybody else because-] {+the third best as+} we spend twice as much as {+you would see+} the [-OECD-] average [-and-] the basis on health care the result of all [-that spending-] {+let spinning+} we don't [-get-] {+have+} a lot of bang for [-our Buck-] {+but+} we should be up here [-we're-] {+were+} down [-there %HESITATION-] {+and+} so we don't [-get a lot-] {+allow+} health [-for all the-] {+problem+} money that we're spending we all know that that's {+the+} most [-of us know that I'm-] {+was the bum+} it's fairly well known what's not as well known is this these [-are-] {+were+} two women [-when Cologne take-] {+one call wanted+} the other one [-Elizabeth Bradley-] {+was with that way+} at [-Yale-] {+yale+} and [-Harvard respectively who actually our health-] {+harvard perspective we whack sheer hell+} services researchers who did an analysis it took the per capita spending on health care which is in the blue look at all [-OECD-] {+always see the+} countries [-but then-] {+that it+} added to that [-per capita-] {+for capital+} spending on social services [-and-] {+as+} social benefits and what they found is that when you do that the [-U. S.-] {+u. s.+} is no longer the big [-Spender-] {+spender+} were actually kind of smack dab in the middle [-of-] the [-pack-] {+pact+} what they also found is that spending on social services and benefits [-gets-] {+did+} you better health so we literally [-have the-] {+heavy+} accent on the wrong [-syllable-] {+so wobble+} and that red spending is our social [-country-] {+contract+} so they found that [-in OECD countries-] {+can only see the country's+} every two dollars spent on social services as opposed to [-dollars to one ratio in-] {+know someone shone+} social service [-spending to-] {+bennington+} health spending is the recipe for better health outcomes [-US ratio is-] {+u. s. ray shows+} fifty five cents for every dollar [-it helps me-] {+houseman+} so this is we know this if you want better health don't spend [-it-] on [-healthcare spend it-] {+health care spending+} on prevention [-spend it-] {+expanded+} on those things that anticipate people's needs and provide them the platform that they need to be able to pursue [-opportunities-] {+opportunity+} the whole world is [-telling us that-] {+telecast and+} yet we're having [-the current debate that-] {+a good they did+} we're having right at this moment in this country [-about healthcare-] {+but doctor there's+} something wrong with our critical thinking [-so-] {+skills+}

There was an pretty interesting Blog post a few months back comparing similar Speech to Text services. His analysis used raw misses to judge accuracy. While that's a very objective measure, language isn't binary. Language is the lossy compression of a set of thoughts/words/shapes/smells/pictures in our mind over a shared medium audio channel and attempted to be reconstructed in real time in another mind. As such language, and especially conversation, has checksums and redundancies.

The effort required to understand something isn't just about how many words are wrong, but what words they were, and what the alternative was. Axis vs. access, you could probably have figured out. "Spending to" vs. "bennington", takes a lot more mental energy to work out, maybe you can reverse it. "Harvard respectively who actually our health" (which isn't even quite right) vs. "harvard perspective we whack sheer hell" is so far off the deep end you aren't ever getting back.

So while its mathematical accuracy might not be much worse, the rabbit holes it takes you down pretty much scramble things beyond the point of no return. Which is unfortunate, as it would be great if there was an open solution in this space. But it does get to the point that for good speech to text you not only need good algorithms, but tons of training data.

Playing with this more

I encapsulated all the code I used for this in a github project, some of it nicer than others. When it gets to signing up for accounts and setting up auth I'm pretty hand wavy, because there is enough documentation on those sites to do it.

Given the word level confidence and timestamps, I'm probably going to build something that makes an HTML transcript that's marked up reasonably with those. I do wonder if it would be easier to read if you knew which words it was mumbling through. I was actually a little surprised that Google doesn't expose that part of their API, as I remember the Google Voice UI exposing per word confidence levels graphically in the past.

I'd also love to know if there were ways to get Sphinx working a little better. As an open source guy, I'd love for there to be a good offline and open solution to this problem as well.

This is an ongoing exploration, so if you have any follow on thoughts or questions, please leave a comment. I would love to know better ways to do any of this.

May 30, 2017 · OpenStack · systems

James Bessen: "Learning by Doing: The Real Connection between Innovation, Wages, and Wealth"

Interesting video by the Author of "[Learning by Doing: The Real Connection between Innovation, Wages, and Wealth](http://amzn.to/2ocnDIX)", which largely comes down to "it's complicated". Sometimes automation replaces jobs, but sometimes it increases jobs, especially when there was pent up demand.

ATMs actually increased the number of bank teller jobs, because it led to needing less people needed per branch, and banks openned up new branches to meet pent up demand. It's also why manufacturing jobs are never coming back, we've met the demand on consumption, and most industries making goods are in the optimizing phase.

What's also really interesting is the idea that new skills are always undervalued, because there is no reliable basis to understand how valuable they are. The transition from typesetting to digital publishing was a huge skill shift, but was pretty stagnant on wages.