Having fun with geometry data in MariaDB and ElasticSearch


At Qandidate.com we get an insane amount of job applications each day. Because we're an awesome company to work for? Well I hope so, but that's not the reason why. It's because we make a free ATS or Applicant Tracking System. Companies worldwide manage their vacancies and candidates with our software. We geocode all the location data and use MariaDB and ElasticSearch to store and analyze them.

In this blog post I will explain how to store geocodes in MariaDB and ElasticSearch and how to calculate the distances between them.

Collecting geocodes

Google has an awesome API for geocoding addresses. When I look up the address of my favorite football club we get a great amount of location data:

curl -XGET https://maps.googleapis.com/maps/api/geocode/json?address=Stadion+Feijenoord

see https://maps.googleapis.com/maps/api/geocode/json?address=Stadion+Feijenoord

What we're particularly interested in is the geometry data:

"geometry" : {
    "location" : {
        "lat" : 51.8939035,
        "lng" : 4.5231352
    },
}

The lat (latitude) and lng (longitude) are coordinates indicating a position on a sphere, in this case our home planet.

With these data points you can do cool stuff like calculating the distance between two points or querying which points are located within a certain radius of another point.

Storing geocodes in MariaDB

MySQL 5.6 introduced spatial data extensions allowing you to store geometry data in a POINT data type.

Note: if you want to play along, I advice you to create a sandbox virtual machine as described at the end of this post.

Let's create a column of the POINT type:

CREATE DATABASE demo;
CREATE TABLE demo.important_locations (location POINT NULL DEFAULT NULL);

You can now insert longitudes and latitudes:

INSERT INTO demo.important_locations(location) VALUES(GeomFromText('POINT(51.8939035 4.5231352)',0));

Now let's calculate the distance from the stadium to the city hall where we celebrate our successes!

Calculate distances in MariaDB

You would expect there would be a simple function to calculate the distance between two geocodes. Actually, there is. It's called st_distance(g1, g2) which is available from MySQL 5.6.

But there is a caveat: these distances are calculated using planar coordinates instead of spherical coordinates. Read about it in this Google Maps API article.

Long story short: this is the statement to calculate the distance of the stadium to city hall at (51.9228644,4.4792299):

SELECT (
  6371 * acos(
    cos(radians(51.9228644)) * cos(radians(x(location))) * cos(radians(y(location)) - radians(4.4792299))
    +
    sin(radians(51.9228644)) * sin(radians(x(location)))
  )
) AS distance
FROM demo.important_locations
ORDER BY distance;

And the distance is: 4.409 kilometers!

+--------------------+
| distance           |
+--------------------+
| 4.4092536956929855 |
+--------------------+
1 row in set (0.00 sec)

Storing geocodes in ElasticSearch

ElasticSearch also has a data type for geometry data called geo point type.

Let's create a new index with a geo_point field:

curl -XPUT http://localhost:9200/important_locations -d '
{
  "mappings": {
    "location": {
      "properties": {
        "name": {"type": "string"},
        "location": {"type": "geo_point"}
      }
    }
  }
}'

Now check the result with:

curl -XGET 'http://localhost:9200/important_locations/_mapping'
{
  "important_locations":{
    "location":{
      "properties":{
        "location":{
          "type":"geo_point"
        },
        "name":{
          "type":"string"
        }
      }
    }
  }
}

Now we can add some location data to our index:

curl -XPOST http://localhost:9200/important_locations/location/ -d '{"name": "Fanshop Centraal Station Rotterdam", "location": {"lat": "51.924285", "lon": "4.469892"}}'
curl -XPOST http://localhost:9200/important_locations/location/ -d '{"name": "Fanshop Stadion", "location": {"lat": "51.893423", "lon": "4.525188"}}'
curl -XPOST http://localhost:9200/important_locations/location/ -d '{"name": "Fanshop Station de Kuip", "location": {"lat": "51.891288", "lon": "4.513916"}}'
curl -XPOST http://localhost:9200/important_locations/location/ -d '{"name": "Fanshop Coolsingel", "location": {"lat": "51.91862", "lon": "4.480092"}}'

I would like to know which Fanshop is located closest to city hall by sorting the result set by distance:

curl -XGET 'http://localhost:9200/important_locations/_search?pretty=true' -d '
{
    "sort" : [
        {
            "_geo_distance" : {
                "location" : {
                    "lat" : 51.92286439999999,
                    "lon" : 4.479229999999999
                },
                "order" : "asc",
                "unit" : "km"
            }
        }
    ],
    "query": {
        "filtered" : {
            "query" : {
                "match_all" : {}
            }
        }
    }
}'

Check the sort field for the distances:

{
  "took" : 173,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : null,
    "hits" : [ {
      "_index" : "important_locations",
      "_type" : "location",
      "_id" : "OpuBlM6nQFOZ5lCEcBVA8w",
      "_score" : null, "_source" : {"name": "Fanshop Coolsingel", "location": {"lat": "51.91862", "lon": "4.480092"}},
      "sort" : [ 0.47564430077142694 ]
    }, {
      "_index" : "important_locations",
      "_type" : "location",
      "_id" : "xL0Oy5XqRs-DgZQqjMgTiQ",
      "_score" : null, "_source" : {"name": "Fanshop Centraal Station Rotterdam", "location": {"lat": "51.924285", "lon": "4.469892"}},
      "sort" : [ 0.6595521711295553 ]
    }, {
      "_index" : "important_locations",
      "_type" : "location",
      "_id" : "RwhR9pXuRP2GBse3JmjdGA",
      "_score" : null, "_source" : {"name": "Fanshop Station de Kuip", "location": {"lat": "51.891288", "lon": "4.513916"}},
      "sort" : [ 4.241464778143902 ]
    }, {
      "_index" : "important_locations",
      "_type" : "location",
      "_id" : "5TV1jQm1ROqvbD3oFR0k7Q",
      "_score" : null, "_source" : {"name": "Fanshop Stadion", "location": {"lat": "51.893423", "lon": "4.525188"}},
      "sort" : [ 4.5449626829339085 ]
    } ]
  }
}

Less than half a kilometer! Awesome, right?

Even better, let's filter fanshops within a 1km radius of the stadium:

curl -XGET 'http://localhost:9200/important_locations/_search?pretty=true' -d '
{
  "query": {
    "filtered" : {
        "query" : {
            "match_all" : {}
        },
        "filter" : {
            "geo_distance" : {
                "distance" : "1km",
                "location" : {
                    "lat" : 51.8939035,
                    "lon" : 4.5231352
                }
            }
        }
    }
  }
}'
{
  "took" : 15,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "important_locations",
      "_type" : "location",
      "_id" : "RwhR9pXuRP2GBse3JmjdGA",
      "_score" : 1.0, "_source" : {"name": "Fanshop Station de Kuip", "location": {"lat": "51.891288", "lon": "4.513916"}}
    }, {
      "_index" : "important_locations",
      "_type" : "location",
      "_id" : "5TV1jQm1ROqvbD3oFR0k7Q",
      "_score" : 1.0, "_source" : {"name": "Fanshop Stadion", "location": {"lat": "51.893423", "lon": "4.525188"}}
    } ]
  }
}

So two out of four fanshops are within 1 km. I should plot these as markers on a map someday!

Pro tip: custom types in Doctrine.

I demonstrated inserting POINTs into MariaDB using the GeomFromText function. In practise we use Doctrine ORM to manage our entities. You can create a custom mapping in Doctrine too.

Luckily someone already did that for us. I tried djlambert's creof/doctrine2-spatial package. It allows you to annotate the location field with the POINT type:

<?php

use CrEOF\Spatial\PHP\Types\Geometry\Point;
use Doctrine\ORM\Mapping as ORM;

/**
 * @ORM\Table(name="important_locations")
 * @ORM\Entity
 */
class ImportantLocation
{
    /**
     * @var integer
     *
     * @ORM\Column(name="id", type="integer")
     * @ORM\Id
     * @ORM\GeneratedValue(strategy="IDENTITY")
     */
    private $id;

    /**
     * @var Point
     *
     * @ORM\Column(name="location", type="point", nullable=true)
     */
    private $location;

    /**
     * @param Point $location
     */
    public function setLocation(Point $location)
    {
        $this->location = $location;
    }
}

This makes inserting a location as easy as:

$importantLocation = new ImportantLocation()
$importantLocation->setLocation(new Point(51.8939035, 4.5231352));

Testing setup

In this post I used a sandbox Vagrant environment with Ubuntu 14.04. I set it up with the following Vagrantfile:

Vagrant.configure(2) do |config|

  config.vm.box = "ubuntu/trusty64"

end

Then I installed MariaDB and ElasticSearch:

sudo apt-get install software-properties-common -y
sudo apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xcbcb082a1bb943db
sudo add-apt-repository 'deb http://ams2.mirrors.digitalocean.com/mariadb/repo/10.0/ubuntu trusty main'
sudo apt-get update
sudo apt-get install mariadb-server -y

sudo apt-get install openjdk-7-jre-headless -y
wget -qO - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb http://packages.elasticsearch.org/elasticsearch/1.3/debian stable main" | sudo tee -a /etc/apt/sources.list
sudo apt-get update
sudo apt-get install elasticsearch
sudo service elasticsearch start