The Shapefile format is a commonly used file format for storing boundaries of geographical areas in maps, which are represented by polygons.
There is a Ruby gem rgeo-shapefile which enables Ruby applications to read data directly from Shapefiles. However, if the boundaries are very precise with a lot of vertices specified:
- The Shapefile can grow very large. For example, the SHP file from the TIGER/Line data for postcode areas in the USA is over 800 MB.
- Reading data from the Shapefile can take a long time.
- Sending data to the UI can also take a long time. We sometimes need to do this to be able to mark geographical areas in a mapping library such as Google Maps.
- The browser and the mapping library need to use more resources to render boundaries in the map.
Usually, we do not need the geographical boundaries to be very precise so compressing the polygon data is acceptable.
I was able to sufficiently compress the TIGER/Line SHP file using the free QGIS software. It turns out that not all Shapefile compression algorithms work well with geographical data, but the Douglas-Peucker algorithm had good results.
To compress polygon data in a Shapefile using QGIS:
- Go to Vector > Geometry Tools, and use Simplify Geometries. The tolerance I used was 0.0004.
- Go to Vector > Geometry Tools, and use Polygons to Lines.
- Go to Vector > Geometry Tools, and use Lines to Polygons.
I believe the last two steps above help ensure that the compressed polygons are closed.