Q: What’s so difficult about locating a city by querying, say, “Austin”?
A: [Austin] can mean a lot of things. When you say to a computer, “I want pizza in Austin” or “Show me a page about Austin,” there are a couple things you might need to do. One is just center a map. “Austin” doesn’t tell you on a map where Austin is. You need to, effectively, look in the back of an atlas, scan, find Austin, find the latitude and longitude and find the map cell. That’s effectively what the computer is doing as well. You need that index. You need it in a format that a computer can read and that lets you get a rough idea of where Austin is and draw it on a map. I can tell you what Austin means to my computer, or what Austin means to the U.S. government. Understanding what Austin is is a combination of knowing roughly where it is on a map, knowing the administrative boundaries and then trying to gain [Foursquare] data about the human intuition of the Austin metro area.
Q; A big part of how you define a given locale is by using polygon shapes, as opposed to just drawing one big box around New York City, for example. Why not just put the world on one giant grid?
A: If I gave you the center of Austin, you would have no idea how far out to search to get things that are logically in Austin that people might be willing to go to. So as a start you want bounding boxes, but even better than that, we talked about polygons. Why are polygons useful? If you draw the bounding box for Brooklyn, no matter how I drew it, I’m going to get a ton of lower Manhattan, a ton of Jersey City and a ton of Queens. And there happens to be a river here, which you don’t really want to cross. Once I have the polygon for Brooklyn, I can now start doing searches where—I can still do a bigger search, but I really really try and prefer the results [in Brooklyn]. You run into this issue where as soon as you cover Manhattan, you start bleeding into Jersey City. And the behavior of people in Manhattan and Jersey City is different. So if we just grid the world, no matter how I grid the world I’m not going to be able to catch this difference. So I think about the polygons as being a way of creating humane big data. So, say we want to know what people are likely to search for near them. We might want to pull up a different list in Manhattan vs. in Jersey City. And the only way we’re going to do that is, rather than going back to arbitrary cells, we’re going back to these humane boundaries.