This week I came across a r/GIS reddit post on a unique problem that I wanted to expand on further and show how easy it is to use open source GIS libraries to solve any spatial problem.
The poster's question was the following...
I have several thousand polygons and several thousand points that need to be snapped to certain places on those polygons. There's 12 different locations on each polygon where a point should be. I was wondering if someone knew a way to specifically have points snap to a specific location on a polygon en masse.
The poster also provided the following graphic to better explain the problem he was trying to solve.
One thing that I enjoy most about the combination of Python with GIS is how quick and easy it is to produce a solution to a unique problem that in this case was going to take hours of manual work to produce. Even if the proposed solution doesn't solve his problem perfectly, hopefully it would provide a significant time savings.
When the problem is relatively simple my two favoratie tools these days for GIS analysis are PyShp and Shapely. PyShp can read and write shapefiles, including the geometry and tabular data, and Shapely can do the heavy lifting with spatial analysis. Originally I had started with the idea of using Shapely, but the solution was simple enough that just the PyShp library with the standard math library were sufficient for this exercise. PyShp can be installed easily with pip.
One other thing to note is that there is no magic ArcGIS solution to this problem. While the geoprocessing tools are legion in ArcGIS, there is no magic single button you can press that would spit out a solution to the original problem. It will require a programming solution with whatever toolset you choose.
As we break down the problem a couple of things stand out:
1) Our output will be a single point for each input polygon.
2) The desired point location must be on the original polygon itself.
3) Based on the graphic, the poster wants the northeast coordinate point.
When the PyShp library reads a feature, it also provides the bounding box with that feature. In this case, a bounding box gets us exactly what we need. If we select the northeast coordinate of the bounding box, that gives us a target point to compare to the points in the polygon. Whichever point is closest to the northeast bounding box coordinate will be our selected coordinate.
In this script the distances calculated for shapefiles in geographic coordinates are not true distances (vs projected coordinate systems like state plane or UTM), but it is close enough to provide a valid solution for this problem.
In order to test this approach, I am going to use a shapefile from the US Census Bureau of all US counties. The shapes are similar to the original poster's graphic, and there are several thousand shapes.
Let's get started...
# Shapefile downloaded from https://www.census.gov/geo/maps-data/data/cbf/cbf_counties.html
f = r'C:\path\to\yourshapefile\cb_2017_us_county_500k\cb_2017_us_county_500k.shp'
r = shapefile.Reader(f)
# The output file for our analysis
out = r'C:\path\to\yourshapefile\cb_2017_us_county_500k\cb_2017_us_county_500k_pnt_test.shp'
# Shapefile writer for our analysis output (a point feature type)
w = shapefile.Writer(shapeType=shapefile.POINT)
# Will use the feature count as the output id.
Now that we have read the input shapefile and created the output shapefile, we can start to iterate through the input shapefile. You can iterate through just the shapes themselves or what PyShp calls ShapeRecords (basically feature geometry with feature records). For this exercise we only need the shapes.
As we iterate through each feature, for each feature we will also iterate through every point within that shape. If the current coordinate in the feature shape is the closest coordinate to the northeast bounding box coordinate, it then becomes the closest point. At the end of each feature iteration, the closest point is then saved to the output shapefile.
See below for the code in detail.
cnt = 0
# Iterating through each feature shape in shapefile
for shp in r.shapes():
# Northeast corner coordinate of bounding box
ne_box = (shp.bbox,shp.bbox)
# Arbitrarily large number as starting distance
short_dist = 100000000.0
# For each coordinate in individual shape, process
for pnt in shp.points:
# Simple pythagorean theorem to get distance between two coordinates
meas_dist = math.sqrt(math.pow((pnt-ne_box),2)+math.pow((pnt-ne_box),2))
# If the measured distance for this coordinate is less than the previous
# shortest distance, make the current point the closest point
if meas_dist < short_dist:
short_dist = meas_dist
close_pnt = pnt
# Write the record and geometry to the shapefile
# Saving the shapefile output with PyShp
It was surprising how fast this algorithm ran (less than 3 seconds) over a fairly large dataset (detailed polygons of all US counties). PyShp is a pure python library, which makes it easy to install and very pythonic in library design (as opposed to the GDAL library python bindings). But that also tends to make it slower than a library like GDAL. In this case, performance was not an issue.
See below for the output produced. As with any kind of script like this, it's always important to review the results and test multiple datasets. There might be areas where the answer given is not what the original reddit poster was looking for... some US county shapes seem to produce a point that is not necessarily the northeast corner of the shape.
After posting a response on reddit, I did receive feedback that this simple Python script was able to save the original reddit poster from hours of drudgery, clicking on thousands of shapes.
As a reader, if you have any questions or feedback about using open source GIS libraries, please respond in the comments!
And remember that use of scripts on here is always user beware. There is no warranty or guarantee that comes with these scripts. And always be kind... Nobody likes having free things thrown at them...
There are comments.