Great contra-article on VC

A nice article by Melanie;

A little anecdote: I have a friend of mine who runs a relatively well-known startup in NYC. He literally LIVES at the office. I’m serious, he moved in. And before that, he slept on the couch most nights. And, after working this hard for almost 2 years, guess how much revenue this startup is generating? Zero. Not a fucking penny. After 2 years of work! Now, I understand that they are trying to build a massive user base with network effects, blah blah blah, but, I’m sorry, that is absolutely fucking insane. I could never see myself living my life that way. I am just not built for it. To put in that many years of your life, and thousands of hours of work, for what will most likely turn out to be an unsuccessful startup, is just crazy to me. But, from reading the tech press, you would think this is one of the hottest startups in New York!

Surface Launch

Microsoft Store in Colorado

I drove through the cold sun of a Colorado morning to the MS store at Lone Tree in Colorado to catch the tail end of the Surface launch. Getting there a couple of hours after opening, there were still over 100 people queuing to enter. It was a bit confusing since the store was relatively accessible and then I found out the queue was just for the people waiting to buy one, everyone else was already buying one. The store made people wait so they saw someone to unbox the device and ask any questions which makes sense.

The energy wandering around was quite something, loud beat music and smiles everywhere. The wrap around video wall was used to great effect with various Win8 promotional material flying around.

Surface on display

The device itself was impressive. Felt like an iPad with a bunch of differentiators.  The windows button was touch sensitive unlike my x86 tablet with its normal solid button. The Touch Cover was deeply awesome and the thing I was worried wouldn’t work; the difference between resting your fingers on it and pressing was noticeable. Basically it felt like the apprehension of using an on-screen keyboard for the first few times: would it work? Yes it does, very well.

The engineering behind the cover, the device, the power cable, the stand is all way above par. This isn’t a piece of plastic. The stand is actually really nice but I’m not sure how it’ll work out on the tray table of a aircraft seat. Multi-touch was great, including things like rotating birdseye aerial maps in the Bing Maps app.

Asus Windows 8 RT tablet

But wait! There’s more! There were tons of other Win8 devices around the store. The one above was an ASUS tablet with Microsoft keyboard and mouse. The keyboard was sweet but the mouse didn’t fit me well. The tablet itself is great, it’s more expensive than a surface for some reason that I didn’t bother to find out, probably more memory or something. The construction was great compared to other more plastic-derived devices. Actually, the keyboard was fantastic. Even compared to Apple mini keyboards with Bluetooth.

There were a bunch of desktop flatscreen all-on-one-computers also running Win8 and I can’t say there was a fault with any of them other than how it feels slightly strange touching the screen. After all, every computer I’ve used in the last 20 years or so hasn’t had one.

Overall? I know I work at Microsoft but even so, this is great execution. The store was approachable, the hardware was clearly very well thought out and the end product fit between the place, the people and the sale was put together excellently. It’s not far off from when I first went to an Apple Store, just evolved.

eBay Intellectual Property Enforcement

I’m selling a lot of junk on eBay right now, something I do every year or so. This time around I’ve been caught by surprise at all the rejected listings.

I have a piece of software shipped with computers (so-called OEM or Original Equipment Manufacturer) about 20 years ago. eBay blocked it after a couple of days saying I have to ship the original computer with it. Fair enough, that was the license I guess but I have no idea where the original computer is.

I have a number of stickers for household security which was blocked for trademark infringement. Given that they are genuine stickers and I’m not misrepresenting them as real when they’re fake, this is apparently complete bullshit. eBay is being used as a tool by the manufacturer to make these spurious claims and kill the market. eBay advises you that you basically can’t do anything about it apart from contact the company making the complaint, who of course have no interest in talking to you.

I have two language learning sets with, I don’t know, 25 DVDs each or something. Both have just been blocked for copyright infringement after a complaint from the manufacturer. I’m squinting to see that one. I would guess it might be a license infringement but copyright sounds like a stretch. But what do I know.

So, four items for sale blocked for apparent license, trademark and copyright violations that I was essentially unaware of and surprised to see.

The key points are that eBay isn’t really on the customers side here, they won’t even talk to you. IP enforcement is much bigger than it used to be on eBay and whatever the rights and wrongs is holding back the secondary market. And, all these items are now destined for landfill instead of being reused by people who either can’t buy them new (literally they’re not for sale any more) or can’t afford the first sale price. Which is a shame.

WiFi is everywhere so drop your cell phone plan

It’s been about a week since I dumped my cell phone service and it hasn’t really changed anything.

The turning point was dropping a phone and cracking the screen. I discovered that the $5/month AT&T insurance plan doesn’t actually insure you for anything I can find. Fast forward through painful phone calls with AT&T, visits to stores which aren’t allowed to repair the phone… and wondering why am I paying these guys?

Free WiFi is now everywhere.

So the fact is that I spend only a tiny part of each day not around wifi. Think about everywhere you go and it’s likely the majority have free wifi. Every pub I went to in London recently had free wifi.

How do I get calls? Skype and Google Voice. Google let me send and receive text messages for free. Skype lets me make and receive calls. Both offer me voicemail. To the outside world it all looks like the same phone number. Both run nicely in the background on an old iPhone 4 I have.

Data rate has not been an issue so far. In fact at Vancouver airport with 50 people at a gate waiting for a flight it was good enough for video calls over Skype. The main downside are those access points which require you to click accept, but they’re actually fairly rare when you spend most of your time at home or at work.

Offline maps are solved with offmaps 2.

I’ve yet to run in to a situation where I needed cell phone coverage. Emergency calls will still work from the phone even without a sim card and it so happens that everyone around you has a phone on them too if you’re really that desperate.

A nice side bonus is saving a ton of cash and disconnecting a little bit, it’s good to be unreachable occasionally even if that means just when I’m driving.


How will you measure your life?

Clayton Christensen is a hero of mine since reading The Innovator’s Dilemma some number of years ago.

This short book offers a 20:20 hindsight tour of how to allocate your time to maximize some function of happiness. There’s a particularly interesting section offering a model for why youth unemployment is so high. He characterizes success as having a combination of resources, priorities and processes. He posits that kids have a lot of resources (money, education, moms shuttling kids to piano lessons and soccer) but no clue how to take those resources and process them in to building something for themselves. Priorities help guide which process to use with which resource when.

I’ve seen that enough to believe it.

The unfortunate forays in to God and Faith subtract from an overall excellent, and concise, book.

Hole for a Handle

Hole for a Handle

This handle-shaped hole in a wall caused by repeatedly slamming a door (handle) in to it got me thinking. Wouldn’t it be nice to instead of having those little door stops near the floor we just built walls with holes in them instead?

Small Data

Big Data is, as Wikipedia puts it “data sets that grow so large that they become awkward to work with using on-hand database management tools.” Think every twitter post ever made or 10,000 peoples’ DNA sequences.

Big data, I think, has a big brother which I’ll call Small Data.

Small Data consists of data sets which are manageable using a spreadsheet yet either hard to obtain, hard to process in to meaningful results and non-obvious in how to visualize and share. I’ll get to an example soon but let’s put this in a table where I’m looking at it from the point of view that Big Data is used by a data scientist and Small Data should be usable by someone with basic spreadsheet skills.

Big Data Small Data
Data size Billions of rows Hundreds / Thousands of rows
Ease to obtain Hard (cost to host) Hard (cost to find)
Ease to process Hard (hardware costs) Hard (time costs)
Ease to visualize Easy (data scientist thinks about this all day) Hard (Joe User knows how to plot a graph and that’s it)
Ease to analyze Hard (hardware costs) Hard (don’t know which questions to ask)

I’m thinking out loud here but maybe the only difference is really dataset size and everything is still “hard” in some way.

Long example

Let’s jump to my example. What books are on my list to read that I can find cheaply or for free at the library as a book, audio CD or library ebook? My library will let me read books on a nook or kindle. In fact they will even let me listen on an MP3 device to many books but I excluded that.

I want to be very clear – this should be an easy question to answer.

If you don’t value your time it is in fact an easy question to answer by going through your list and searching every one on your local library. But of course your list could be in one of 100 formats and your library has it’s own formatting.

Luckily my wish list is on Amazon. It’s too many for me to do by hand in a reasonable amount of time (350 odd books). My library is the excellent KCLS and the most trafficked library in the US, or something.

We’ve excluded the time-consuming manual route. As someone with programming skills I should be able to write a script to connect the two and spit things out, right? Not really. First there is no lists API on amazon, supposedly because it saw little use but if I had to guess then because the potential for revenue loss if I import my list to the competition. Searching around there are solutions involving scraping data from your list by putting it in a simplified view in to a Google spreadsheet.

Sadly none of them worked. Even if they did, my wish list is multi-page and I’d have to mess around scraping each page. What I learnt though is that you can put your list in to a simplified view by adding ?layout=compact on the end of it (example).

Just copying and pasting the table in to Excel resulted in data. I did that four times, once for each page of my list and voila I have some data!

Crappy data, but data. Out of the table all I got was the book title and the price. The price was missing in cases where the book was only available second hand. I could probably spend an infinite amount of time playing with the copy & paste to extract the URL of the book or write a script to do that. To be clear though I value my time and my judgement was that I could get lost in the scripting route and spend hours iterating over data and fixing tiny bugs but I wanted the most efficient way to answer my question. I also wanted to assume the role of Joe User with some mild spreadsheet skills.

There’s a great way to use the crowd to answer questions about data; Mechanical Turk. With it I could spend some small amount of money to ask people to do things with my spreadsheet. I could upload the sheet and then spend a penny or so per row and the Turk system would allow any of its users to go do something like find out whether book X was available in format Y at my library.

Again I could go scrape my library website with a script or hope that they had an API. Again my guess was Turking it would be cheaper and quicker.

Off to we go. The first thing you have to do is design your task. Mine looked like this:

Thus an individual turker would see one or many of these tasks. Each would have the book name and ask for the price.

Uploading the sheet was non-trivial. First you need to put a heading on each column so I did ‘name’ and ‘price’. Then you need to export it to CSV. So I did, and sent it up to the cloud. MTurk failed saying there were invalid characters so I went in to the CSV with a text editor and removed a stray unicode umlaut and transliterated it to it’s ASCII cousin. This is all a bunch of gobbeldygook which roughly translates as “you need 3-8 years of computer science skills to do all this easily”.

I paid each one cent for each task. That adds up to $3.50 for all the books then some amazon fees bump it to $5 or so. Now I have a list of books, prices and the “best price” which I took to mean the 2nd hand price.

I’m format agnostic so I don’t care if I have a new book, 2nd hand, audio book, kindle, library, ebook from the library or what. It’s all the same content just in different packaging. Of course some are more accessible than others so I can spend $10 to get a book instantly on kindle or I can wait some number of hours or days to get it from the library. Thing is, my list is big enough that assuming the books are equally interesting (and they have to be since I haven’t read them to have data on which are more interesting than others) I can always get something interesting in any format I want.

This is important since right now choosing a book to read at random implies the format. So I pick a book from my list and maybe I have to pay for it, wait for it at the library or whatever. Instead, I want to be able to say “lets get the book on my list in CD Audio that’s been on the list for the longest time” or something like that when I no longer have any CDs in the car for the commute.

I have that data now (title, price, best price) only in theory. I have the first two in one spreadsheet. The second set is in another sheet that MTurk will export to me as a CSV. Connecting the two requires some spreadsheet skills. So I put the two in different sheets in the same workbook. Then I use the title in my original sheet to look up the best price in the second sheet using the VLOOKUP function. The second sheet that amazon exports adds all kinds of data like the time the task was done, the ID of the Turker and stuff I don’t care about.

So I go locate the columns I do care about, which are 28 columns over. I spend a bunch of time trying to use the LOOKUP function in Excel to find the data I want before figuring out that LOOKUP does some kind of odd interpolation thing and expects the data to be sorted. Amazon Mechanical Turk doesn’t return the data in the same order I sent it so I spend time playing around with VLOOKUP which has a different syntax to LOOKUP. Finally I get some data which looks like 350 or so rows of (title, price, best price).


No, we have quite some time to go. Now I have the pricing I don’t know the availability at the library. So I go back and modify my turk task and instead of asking people to find each book on Amazon I ask them to go to my library and find if it’s available in paper, CD or ebook. I do these as three separate tasks and each link is slightly different. You can ask the KCL catalog to search for the type of asset and also if it’s available (since I don’t care if they have a book but they don’t have copies any more). So I do some magic with the URL parameters so that the turkey doesn’t have to do that.

At each step I’m trying to make it a simple Turk task without spending an infinite amount of time making it too simple. For example I could encode the name of the book using urlencode() or something and then they wouldn’t have to copy/paste each book title. Economics becomes useful. Since the floor, the lowest I can pay per book title, is one penny all I have to do is not make it difficult enough that the Turker wants 2 pennies in compensation. So, I encode the book type (cd, ebook, paper) in the URL so they don’t have to interact with the search form. Then I add #available to the start of every book title in the turk task. This is a magic string which tells the KCLS catalog to only show available books. I could have put it in a URL parameter but I was bored and elegance was not the goal.

Here’s the resulting Turk task example:

Note that I also changed the text box input to a yes/no option. I didn’t do that to make it easier for the turker, it makes it easier for me. If I allowed them to type “yes” then I would end up with lots of variations like “Yes”, ” yes”, “YES” and so on that I am uninterested in processing.

Now I have an additional three spreadsheets with the title plus one of paper availability, CD availability or ebook availability.

Except I don’t. In the hour after I submitted each task about 98% of the results were in so I have partial results. This was still quicker than I expected but it also included some bad data. I double checked a few of the results and found one primary turker who had entered “yes” to 77 books without bothering to actually check the KCLS website. I banned him, didn’t find any more mass bad data and left it there.

I have a bunch of options here. I could ask multiple independent turkers to check the results. I could batch them up so one task required checking 3 formats not 3 tasks each checking one format each. I could pay more and attract more reliable turkers. But for now I’m happy with the results.

Long story short here is my final result:

I’ve used some conditional formatting to color the availability and price of the books. Plus I have one derived column looking for which books have the best 2nd hand to brand new price ratio.

Now, a few hours and $20 later I can actually ask some questions about the data. The problem is that Joe User wouldn’t get this far and it’s 2012. We should be able to do this kind of thing. For the curious, here is the complete data.

Back to theory

Big Data is great but clearly we can’t even tackle simple Small Data problems. The data collection is hard, the analysis is hard and the skill sets required are far beyond where they need to be.

There are a number of approaches happening today to try and help solve some of these problems and they go down approximately two routes. On one side there are those who believe “if only all the data in the world was all in some universal format” then things would magically be better. On the other “if only we had strong AI” then things would magically be better.

The latter may happen and clearly would be able to solve my questions. The question is at what cost? Optimistic singularity predictions are still decades out.

I’m pretty skeptical about the semantic web / linked data model whereby merely linking everything together or putting it all in one schema, or some combination, will help anyone. One reason is that it’s been done. Freebase is still ahead of it’s time, it embodies the “huge graph of data in the sky” and it plugs some gaps. “But what if everybody put their data in there!?” I hear you cry. Well, what if everybody stopped smoking, had their vitamins and didn’t read tabloids? It’s not going to happen.

There is some argument to be made that if we merely all used the same format that would help. And it would a little. But remember we tried that with XML. JSON is cute but again the value proposition for us all to move to JSON is not there yet. Either way it wouldn’t help me with my library book questions. My guess is that even if Amazon spit out JSON for my wish list and my library had a JSON API then I’m still waiting on the logic to tie them together. Maybe it will solve my problem, maybe not.

The halfway houses abound. Siri and Wolfram Alpha leap out as combining a sprinkling of data with a soucon of machine intelligence. Look how brilliantly they do! The domains they service may be tight but they offer us a peek of how things will work.

My guess is that the future looks like a munging together of Small Data, Big Data, automatic processing and human intelligence used as and when appropriate. Today we have some wild stabs in the dark at each of these but nothing like the coherent platforms of the future we could wave our hands to describe. It’s going to be fun to see it happen.

Powered by WordPress. Designed by WooThemes