Archive | crowdsourcing

A Digital Globe

“Energy Flux,” data source: National Geospatial-Intelligence Agency, September 2000.

Crowdsourcing, as a term, has been around for something like 12 years according to Wikipedia. OpenStreetMap is a little older and the idea stretches back fairly arbitrarily. Wikipedia thinks it goes back to the 1714 Longitude Prize competition. That seems like a stretch too far, but in any case, it’s been around a while.

The ability to use many distributed people to solve a problem has had some obvious recent wins like Wikipedia itself, OpenStreetMap and others. Yet, to some large degree these projects require skill. You need to know how to edit the text or the map. In the case of Linux, you need to be able to write and debug software.

Where crowdsourcing is in some ways more interesting is where that barrier to entry is much lower. The simplest way you can contribute to a project is by answering a binary question – something with a ‘yes’ or ‘no’ answer. If we could ask every one of the ~7 billion people in the world if they were in an urban area right this second, we’d end up with a fair representation of a map of the (urban) world. In fact, just the locations of all 7 billion people would mimic the same map.

Tomnod is DigitalGlobe’s crowdsourcing platform and today it’s running a yes/no campaign to find all the Weddell seals in their parts of the Antarctic.

The premise is simple and effective; repeatedly look for seals in a box. If there seals, press 1. If not, press 2. After processing tens of thousands of boxes you get a map of seals, parallelizing the problem across many volunteers.

Of course, it helps if you have a lot of data to analyze, with more coming in the door every day. There aren’t that many places in the world where that’s the case and DigitalGlobe is one of them, which is why I’m excited to be joining them to work on crowdsourcing.

Crowdsourcing today is pretty effective yet there are major challenges to be solved. For example:

  • How can we use machine learning to help users focus on the most important crowd tasks?
  • How can crowds more effectively give feedback to shape how machine learning works?
  • Why do crowds sometimes fail, and can we fix it? OpenStreetMap is a beautiful display map yet still lacks basic data like addresses. How can we counter that?

These feedback loops between tools, crowds and machine learning to produce actionable information is still in its infancy. Today, the way crowds help ML algorithms is still relatively stilted, as is how ML makes tools better and so on.

Today, much of this is kind of like batch processing of computer data in the 1960’s. You’d build some code and data on punch cards, ship them off to the “priests” who ran the computer and get some results back in a few days. Crowdsourcing in most contexts isn’t dissimilar. We make a simple campaign, ship it to a Mechanical Turk-like service and then get our data back.

I think one of the things that really separates us from the high primates is that we’re tool builders. I read a study that measured the efficiency of locomotion for various species on the planet. The condor used the least energy to move a kilometer. And, humans came in with a rather unimpressive showing, about a third of the way down the list. It was not too proud a showing for the crown of creation. So, that didn’t look so good. But, then somebody at Scientific American had the insight to test the efficiency of locomotion for a man on a bicycle. And, a man on a bicycle, a human on a bicycle, blew the condor away, completely off the top of the charts.
And that’s what a computer is to me. What a computer is to me is it’s the most remarkable tool that we’ve ever come up with, and it’s the equivalent of a bicycle for our minds. ~ Steve Jobs

In the future, the one I’m interested in helping build, the links between all these things is going to be a lot more fluid. Computers should serve us, like a bicycle for the mind, to enhance and extend our cognition. To do that, the tools have to learn from the people using them and the tools have to help make the users more efficient.

This is above and beyond the use of a hammer, to efficiently hit nails in to a piece of wood. It’s about the tool itself learning, and you can’t do it without a lot of data.

This is all sounding a lot like clippy, a tool to help people use computers better. But clippy was a child of the internet before it was the internet it is today. Clippy wasn’t broken because of a lack of trying, or a lack of ideas. It was broken from a lack of feedback. What’s the difference between clippy and Siri or “ok, Google”? It’s feedback. Siri gets feedback in the billions of internet-connected uses every day where clippy had almost no feedback to improve at all.

Siri’s feedback is predicated upon text. Lots and lots of input and output of text. What’s interesting about DigitalGlobe’s primary asset for crowd sourcing is all the imagery, of a planet that’s changing every day. Crowdsourcing across imagery is already helping in disasters and scientific research and 1,001 other fields with some simple tools on websites.

What happens when we add mobile, machine learning and feedback? It’ll be fun to find out.

Kickstarter almost funded

It’s very humbling to look at this graph of funding over the last few days for the OpenStreetMap Stats Kickstarter:

I had expected the whole thing to fail, now it looks like it’ll succeed. I was asked once in a job interview about how much failure I’ve recently had. The idea was that if you’re not failing you’re not really trying – if everything is a success then you can’t be pushing the envelope.

I figured asking for $1k for a statistics site that’s relevant to a minority of a minority in the world was going to be too much to ask for. In the grand scheme of things it’s not a whole lot of cash, but still. And yet, here we are.

Speaking of failure, “failure” itself is the wrong way to model how these things work. Scott Adams has called it “having a system” instead of “goals”. Other people have called it “failing forward”. Either way – the basic idea is that whatever happens you want to win. Adams wrote a whole book about this:

In this case, if the Kickstarter fails then I can shut the project down. This for me is a clear win. I get more time and one less distraction. I don’t have to pay for the hosting any more. I also learn that tiny kickstarters aren’t going to work and not to bother trying them again in a similar context.

On the other hand, if it succeeds that’s great too. I can dedicate the time to fix the site, the hosting is paid for and it proves that there are people out there who care about it.

Setting up situations like this can be enormously beneficial – where you win either way. But, it’s still hard since my lizard brain wants to avoid anything that looks like failure and being judged by those who see it in that way.

There are plenty of smart, educated people out there who think Amazon’s lack of profit is a “failure” for example. I think it’s beautiful. For a start, the definition of “profit” is “we have no idea what to do with the money so we’ll give it to you”. Amazon isn’t running out of ideas worth funding. Second, if they spend all the notional profit then they don’t have to pay tax on it and get some percentage advantage via that. Reinvesting in this way for a few decades leads to some spectacular growth.

This all leads to an idea that’s almost too tantalizing to verbalize: Maybe it’s possible to live by doing Kickstarter after Kickstarter? The idea is insanely fun and the implications profound. If it’s possible to raise $1k in a week then that would lead to a $52k/year revenue, supposing you had 52 great ideas. Perhaps more likely are $10k kickstarters every 2-4 weeks, or $100k kickstarters every month or two. With some number of them failing, plus costs, it should still be possible to live using this method.

OpenStreetMap Stats Kickstarter

I’m attempting to raise $1k in a week via Kickstarter to fix the OpenStreetMap Stats site.

The site lets you explore OSM data by country, time and data type:

Sadly it’s suffered bit rot and some countries are broken and not updating. The $1k goes toward fixing, open sourcing and hosting it for a year or two. Else, it gets canned.

So far it’s raised $163 with 6 days to go.

OpenGeoCodes iOS and Android Apps – Collect Open Address Data

Open Address data from OpenGeoCodes in Durango, CO. Green pins are manually verified, red are awaiting verification.

Open Address data from OpenGeoCodes in Durango, CO. Green pins are manually verified, red are awaiting verification.

screen696x696OpenGeoCodes now has iOS and Android apps to optimize the hand collection of addresses.


Addresses are the primary limiting factor of OpenStreetMap – there just isn’t much out there that’s easily licensed and OSM itself for a variety of reasons lacks address data. OSM looks pretty – it’s a great display map. It’s also routable with a lot of work. But, you can’t find addresses on it.

OpenGeoCodes has data in the US and some starter data in Canada and the UK to try to fix this.

So what do the apps do?

The apps let you walk around and collect data. Say you’re standing outside 100 Main Street – just tap it, the app records the location and you’re done. Normally the app tries to guess where you are based on location.

But wait, there’s more! As you walk along, the app will optimize what addresses to show you. For example if you’re walking on the even side of a street going north, the app will figure this out and present you ascending even numbers. So if you enter 100 and 102, and the app knows 104 is nearby it will focus on this.

This makes it easy to walk along and just tap, tap, tap to collect data. We collect this data together and then make it freely downloadable. There’s also a mailing list if you want to get involved.

Where to from here? The feature list includes a more human design, notifications for when near places with no data, OSM upload and fixing and more. Drop me an email if you run in to any issues.


Why you should be using BrowserLocation to get your users location


When a web app asks the browser for location, it prompts the user if they want to share it.

At this point it becomes binary. Either they click ‘yes’ and you get (usually) pretty good latitude and longitude or they hit ‘no’ and you get nothing at all.

The second problem is you just get latitude and longitude. You have to do some work to turn that in to something meaningful like a city or country.

Enter BrowserLocation.

BrowserLocation is a JavaScript wrapper that asks the browser for location. If they click ‘yes’, then we return the data to you just as before but we add city, state and country information. If they click ‘no’ then we fallback to IP address location. This IP-based information is lower accuracy than GPS or wifi triangulation, but better than nothing at all!

And to make it even better, every use of the API is used to improve the IP address fallback data for other users. And you can download the data. And, you can even query it directly like this.

So think of it as an open IP to geo project, with a wonderful API and a virtuous circle where everyone using it also makes the data better. The data is initially released CCBYNC and will leak in to the public domain over time, this creates a space to monetize the data and pay to improve it. It’s been seeded by paying people all around the world to collect some data.

Enjoy. Please email if you have any questions.

The algorithm and the failing kickstarter

I launched a kickstarter yesterday and it’s not doing well.

Here’s my basic algorithm:

  1. Try random things at zero cost
  2. Find the ones that work
  3. Scale those

We really can’t predict what will work or not which is why speed is so important – the more things you try the better since you’ll hopefully find something that will work. Boyd talks about this in his OODA loop. You observe your situation, orient yourself, decide what to do and then act upon it. Then go back to the start. He posits that if you can do this quicker than your opponent then you’ll win.

So let’s observe the situation.  This kickstarter raised about $600 in day one, with a fairly huge amount of publicity amongst map people.

Let’s orient given prior knowledge. The last two kickstarters did $1,600 in day one. They raised just under $15k and $10k total. It’s not super likely this one will reach $5k given the curve and what little we’ve had today (day two).

So it’s decision and action time. I’m pretty sure that:

  1. The prices on the kickstarter are too high
  2. The print images aren’t compelling enough

The prices are easy to drop and simplify. I’m thinking of just having one print at $40 or so since that’s the median price for this kickstarter and the last poster one.

As for the images, I’m working on continent-wide instead of city images. I’ve fixed some of the drawing issues. The thickness of the lines drops as log10() and I’ve changed that to log() which is nicer. I’m also working on aliasing and changing the color from “just black” amongst other things. Here’s an image of all the roads going to london:


There’s a bunch of work to be done here, but it gets the point across. My guess is that continent images like this will be more compelling.

The interesting question is how to get feedback. Asking the existing backers makes partial sense since they committed money but on the other hand, we need to figure out why people who didn’t back it didn’t back it. Feedback welcome of course.

Part of the reason for this whole thing is that the printer I bought for the last project is dead and needs to be replaced. This isn’t compelling in and of itself. Remember the “try random things” part of the algorithm? Well in a sense, yes, random things need to be tried since we can’t predict very well the chance of success. But, there are a couple of things to consider.

If we have two ideas A and B we may as well go for the bigger one. The reason for that is that it has more ways to succeed. A bigger idea may contain some element of a successful idea. A smaller idea has a lower chance of success and a lower overall level of dollars to attract. The cost remains the same: zero. This is because that’s what I’m going to spend since zero means the maximum number of ideas to be tried. Anything above zero restricts the number of ideas.

Second is opportunity cost. Picking the smaller idea costs the potential gain of a bigger idea. Doing a $5k kickstarter is the same as doing a $50k kickstarter with a 10% chance of success. But the $50k idea has a higher potential payoff and the same cost (zero) with a higher number of sub ideas that might spark some following.

There’s also just less competition. Doing anything commercial with OSM right now is hard because there are irrationally funded startups doing everything for free and owning the whole space. Competing with free is hard. At the other end of the spectrum I really love Thing Explainer simply because out of the billion books published this past year, it’s so unique. It’s not another tween vampire romance. Doing unique and big things is the way to go.

Is the cost really zero to do a kickstarter? No. It costs my time and so on, but it’s about as low as you can go.

Back to failure. The typical valley thing is to embrace and love failure. But that’s really just a way of avoiding it the same as treating failure as bad. The secret is to know failure sucks and push through it as a process, not to pretend it’s good or bad. It just is.

I tested a bunch of ideas last year and most of them failed. Nobody remembers any of them. Anyone remember Fake Mayor? That wasn’t even a failure, that sold for actual money. Anyway. I have a bunch of data on the ideas that succeeded and really I should have done one of those as my next kickstarter, or one of the other really big ideas I have laying around. Next time. (And, next time might mean next week at this rate).

(As an aside, I want to do a book about how to test and build ideas for super cheap using the internet, I think it’d be interesting).

So. The plan is to either pivot this kickstarter, kill it or restart it with simplified rewards in the next 24 or 48 hours. What do you think?

(It should be noted that some semi-pivoting by putting the above image on the kickstarter and so on is simple and free so I’ll do that in any case, but it’s not really a full pivot).

New Kickstarter: Every Road


I have a new kickstarter live now: Every Road! Every Road is a unique poster print of every road leading away from your house (or any other point). Above you see the bay area, driving away from the Ferry Building at the NE of San Francisco. Here’s the same thing driving away from a point in the Sunset:


The roads get thinner as you go and lead to a tree-like structure. Each print is totally unique to you. Here’s driving away from Buckingham Palace in London:


Here’s walking everywhere from Wall Street in Manhattan. Notice most route on Manhattan end up walking North and branch out across each side:


Here’s the same thing, but driving:


Notice how it leads to a totally different map because driving leads to quicker routes along the edge of Manhattan and then driving inward to each point. As opposed to walking, where your maximum speed doesn’t change depending on what road you’re walking on.

The data of course comes from OpenStreetMap, more details are at the Every Road kickstarter!

What is it?


What is it? is a free iPhone app for recognizing objects. You point it at things and the app will recognize the stillness of the scene, vibrate, take a photo and tell you what the object is. It’s “always on” and watching on purpose, just point your phone at something and it will attempt to recognize it.

And if it’s wrong, you can fix it by tapping the flashing text and typing what it is.

The app relies heavily on ImageIdentify[] from Wolfram Research. You can play with their website here on a phone, tablet or desktop device. The backend of the app is powered by the rather wonderful Wolfram Cloud which makes building something like this very easy.

Using Wolfram Cloud isn’t far from using Mathematica, and you can deploy APIs trivially from both:

APIFunction[{“image” -> “Image”}, ImageIdentify[#image] &]]

That’s about all it takes. The system returns a URL and you can then use that to make requests against. Mathematica and the Wolfram Cloud go way, way beyond this basic example of course. I can’t recommend enough that you play with this stuff. In a couple of weeks there’s a new book coming out on it too!

Today, this app has some decent if limited capabilities. The goal is something better than a star trek tricorder – something that will tell you the species of a leaf or the model year and trim of a car.

Lending Club & OpenStocks

Lending Club

I’ve been researching a few stocks recently including LendingClub which I now own. It got me thinking – why isn’t there a wiki of all this information?

For example, one of the interesting things for me to follow is the SEC Form 4 filings of a firm. This is where people who have some major position in a public company have to make public if they buy or sell shares. For example, if you learnt that the CEO was selling or buying shares that would be useful to know. It’s an indication of whether they’re personally invested or not. Similarly, if the whole leadership team is buying or selling then that tells you more and so on.

I just read through all of LendingClub’s Form 4s going back to when they went public in December 2014. I’ve summarized them in the OpenStocks wiki here. Each Form 4 is pretty dull. It contains who’s selling or buying, what it is they’re selling or buying (stocks, options etc), when and for how much. There can be footnotes to explain transfers and other things like that.

Aren’t there things that automatically parse these forms and spit this stuff out? Not really. Yes, they exist, but they tend to be terrible at interpreting the information. For example when someone in the leadership team of a company gets some shares they will often put them in a bunch of trusts. This can make the automated software misreport their holdings and lead you to think they have less at stake than they do.

What we’re doing is compressing information and time. It took about 4 hours to read the Form 4s for the last year, wikifi them, do a bit of research on the people and so on. We need to compress that time and energy in to a buy/sell. The first intermediary step is to tell a story using the Form 4’s as recovered DNA in Jurassic Park, and then filling in the holes. And hoping no dinosaurs eat you.

Thus. LendingClub went public after giving hundreds of millions of shares to their VCs who acquired rights to them in the A, B and C rounds. Some of the VCs also bought some at discount. The IPO price was $15. Six months or so later they gave a bunch of shares to their board. Then the VCs started selling them in lots of 2 or 10 million shares here or there. All this selling probably depressed the price, but the VCs have to do it to return capital to their investors. It’s likely this selling will continue.

In the last couple of months a few insiders have been selling shares for “new Tesla” to “new apartment” levels of cash ($100k to $500k or so). But those sales are dwarfed by their options and holdings across their trusts and so on. They’re sitting on tens to hundreds of millions of dollars. Incidentally, all the leadership team plus their board have excellent careers and lots of credibility to lose. This kind of selling looks acceptable. Maybe they just want a new Tesla or to send a child to college or whatever.

The quarterly earnings were a few weeks ago. They turned a small profit of about $1MM on profit of $110MM or so on $2.something billion in loans for the quarter. The decimal places don’t matter to me much. The graph with all the numbers screaming upward does.

The costs are all flat as a percentage of revenue if you go look at their filings. But, the revenue has been going up. A lot. So they’re hiring like crazy. If you look at glassdoor, the reviews are all pretty good modulo complaints about the rate of growth. At some point they’ll amortize the staff and other costs over growing revenues (e.g. they won’t need to keep hiring).

The earnings call laid lots of heavy hints about a new product in 2016. My bet is that will be mortgages. Eventually LendingClub will offer every aspect of finance and they want to ship 2 products a year. So far they have personal loans and business loans. There’s a lot more out there from credit cards to kickstarter. Mortgages just feel kind of big and obvious and leverage the existing client base really well. Plus, they have so much (p2p) money they need to find places to put it.

LC aren’t at war with the banks, which is very nice. Instead they’re partnering all over the place to help banks find uses for their capital and help their customers find loans. All very win-win.

There’s negative stuff to find too which I leave as an exercise for the reader.


So – why not put all this stuff in a wiki? I can’t find anything like it so I built one at OpenStocks. It’s very early.

It’s interesting to think what an open source community would look like, blended in to the investment space. Well it would have a wiki, and a mailing list right? And it would have some sort of chat area and a github repository. And it would have code and tools.

I view the wiki as the first step, informing the next things to be built. It’s fairly obvious that the public lack the tooling to understand investments, and open source code would fix that. If you’re a huge investment bank then you can pay people to read all those forms or write code to summarize them. It would be interesting to see what happens when you do that in a community.

Part research tool, part opinion, part software, part community. And google ads or something to pay for it. Mainly it’s just the things I want available when figuring out to buy or sell.


Via blogging about this I found Rank and filed, Sumzero and Value Investors Club which are all way more advanced and already running compared to the wiki idea.

The Book of OSM, now available!

bookThe Book of OSM is now shipping on Amazon Kindle and in paperback (uk, de)!

This book contains 15 interviews conducted by OSM founder Steve Coast with the people who were there as the project began and grew. Starting in 2004, the interviews trace how a rag tag collection of volunteers was able to produce a map which compares in quality to maps produced by multi-billion dollar corporations. Learn how such an ambitious project got started and then succeeded at mapping the world, for free!

The book was the result of a kickstarter that raised just under $10k.

Powered by WordPress. Designed by WooThemes