Wednesday, May 14, 2014

2D Convex hull in C#: 40 lines of code

Note: this blog has moved here.

If you want a convex hull and you want it now, you could go get a library like MIConvexHull. That library claims to be high-performance compared to a comparable C++ library, but that claim is implausible, especially for the 2D case, since the algorithm relies heavily on heap memory and dynamic dispatch, accessing all coordinates through an IVertex interface that returns coordinates as double[], and it uses LINQ rather liberally. (Update: Ouellet's Algorithm is a good choice.)

Loyc.Utilities has a much simpler convex hull algorithm in the PointMath class that you might find easier to adapt to your own codebase, and although I haven't benchmarked it, you can plainly see that scanning for the convex hull takes O(n) time and needs only simple math, so that the overall running time will be dominated by the initial sorting step (which takes O(n log n) time). Because of the simple math used in this algorithm, it performs well both on powerful desktop machines and (given integer or fixed-point workloads) on lower-power machines with no FPU.
/// <summary>Computes the convex hull of a polygon, in clockwise order in a Y-up 
/// coordinate system (counterclockwise in a Y-down coordinate system).</summary>
/// <remarks>Uses the Monotone Chain algorithm, a.k.a. Andrew's Algorithm.</remarks>
public static IListSource<Point> ComputeConvexHull(IEnumerable<Point> points)
  var list = new List<Point>(points);
  return ComputeConvexHull(list, true);
public static IListSource<Point> ComputeConvexHull(List<Point> points, bool sortInPlace = false)
  if (!sortInPlace)
    points = new List<Point>(points);
  points.Sort((a, b) => 
    a.X == b.X ? a.Y.CompareTo(b.Y) : (a.X > b.X ? 1 : -1));

  // Importantly, DList provides O(1) insertion at beginning and end
  DList<Point> hull = new DList<Point>();
  int L = 0, U = 0; // size of lower and upper hulls

  // Builds a hull such that the output polygon starts at the leftmost point.
  for (int i = points.Count - 1; i >= 0 ; i--)
    Point p = points[i], p1;

    // build lower hull (at end of output list)
    while (L >= 2 && (p1 = hull.Last).Sub(hull[hull.Count-2]).Cross(p.Sub(p1)) >= 0) {

    // build upper hull (at beginning of output list)
    while (U >= 2 && (p1 = hull.First).Sub(hull[1]).Cross(p.Sub(p1)) <= 0)
    if (U != 0) // when U=0, share the point added above
    Debug.Assert(U + L == hull.Count + 1);
  hull.RemoveAt(hull.Count - 1);
  return hull;
This code relies on Loyc's DList class which is like List<T> except that it can efficiently add and remove items from the beginning OR end of the list, and it has some extra stuff, such as First and Last which return the first and last item in the list, repectively. This code assumes the existence of a Point struct with any kind of coordinates (int, float, double, etc.) and methods such as Sub to subtract, Cross to compute the cross-product (a.X * b.Y - a.Y * b.X), etc. In Loyc.Utilities, the code is located within a T4 template and is actually replicated for int, float and double coordinates... although currently the code breaks for large integer coordinates since Cross() returns int, which can easily overflow. IListSource was explained in an earlier post and can be replaced with IEnumerable if you prefer.

Edit: There is now a LoycCore package in NuGet that includes Loyc.Essentials.dll which contains data types (e.g. DList) used by ComputeConvexHull(), as well as Loyc.Utilities.dll which contains ComputeConvexHull() itself.

Edit: original version was buggy.

Edit: I found that the algorithm took 11.3% less time for 10 million points if I changed the sorting step as follows:
  points.Sort((a, b) => 
    a.X == b.X ? a.Y.CompareTo(b.Y) : a.X.CompareTo(b.X));   // BEFORE
    a.X == b.X ? a.Y.CompareTo(b.Y) : (a.X > b.X ? 1 : -1)); // AFTER
I wrote this little benchmark:
public static void TestConvexHull()
  int[] testSizes = new int[] { 12345, 100, 316, 1000, 3160, 10000, 
                 31600, 100000, 316000, 1000000, 3160000, 10000000 };
  for (int iter = 0; iter < testSizes.Length; iter++) {
    Random r = new Random();
    List<PointD> points = new List<PointD>(testSizes[iter]);
    for (int i = 0; i < points.Capacity; i++) {
      double size = r.NextDouble();
      double ang = r.NextDouble() * (Math.PI * 2);
      points.Add(new PointD(size * Math.Cos(ang), size * Math.Sin(ang)));
    // Bonus: test sorting time to learn how much of the time is spent sorting
    var points2 = new List<PointD>(points);
    EzStopwatch timer = new EzStopwatch(true);
    points2.Sort((a, b) => a.X == b.X ? a.Y.CompareTo(b.Y) : (a.X < b.X ? -1 : 1));
    int sortTime = timer.Restart();
    var output = PointMath.ComputeConvexHull(points, true);
    int hullTime = timer.Millisec;
    if (iter == 0) continue; // first iteration primes the JIT/caches
    Console.WriteLine("Convex hull of {0,8} points took {1} ms ({2} ms for sorting step). Output has {3} points.", 
      testSizes[iter], hullTime, sortTime, output.Count);
Which produces these results on my laptop:
Convex hull of 100 points took 0 ms (0 ms for sorting step). Output has 12 points.
Convex hull of 316 points took 0 ms (0 ms for sorting step). Output has 18 points.
Convex hull of 1000 points took 0 ms (0 ms for sorting step). Output has 28 points.
Convex hull of 3160 points took 2 ms (1 ms for sorting step). Output has 36 points.
Convex hull of 10000 points took 5 ms (3 ms for sorting step). Output has 59 points.
Convex hull of 31600 points took 17 ms (10 ms for sorting step). Output has 85 points.
Convex hull of 100000 points took 55 ms (37 ms for sorting step). Output has 127 points.
Convex hull of 316000 points took 197 ms (122 ms for sorting step). Output has 184 points.
Convex hull of 1000000 points took 636 ms (411 ms for sorting step). Output has 286 points.
Convex hull of 3160000 points took 2045 ms (1368 ms for sorting step). Output has 379 points.
Convex hull of 10000000 points took 6949 ms (4953 ms for sorting step). Output has 563 points.
Notice that the sorting step takes the majority of the time, as you would expect since sorting is O(N log N) while the rest of the code is O(N). (Why 316? Logarithmically, it's half way between 100 and 1000.)

Copyright: me (David Piepgrass). Do with the code as you wish (originally LGPL).

1 comment:

Ozay Gulercin said...
This comment has been removed by the author.