Fast sorting with limit in Elixir

9th May 2023 from jc's blog

If you have a large list of things in Elixir and then want to sort them in order to get the top n for something, chances are that your existing sort using Enum.sort may be relatively slow, as it rebuilds the entire list. We can do it more efficiently by simply accumulating the top n in an optimized data structure.

Erlang provides this optimized data structure for us, namely :gb_trees. We roughly need to do the following:

Let’s look at the function in detail:

  defp sort_by_limit(stream, key_fn, limit) do
    |> Enum.reduce({:gb_trees.empty(), 0, 0}, fn item, {tree, smallest, size} ->
      case key_fn.(item) do
        key when key > smallest and size >= limit ->
          {_smallest, _value, new_tree} = :gb_trees.take_smallest(tree)
          {:gb_trees.insert(key, item, new_tree), key, size}

        key when size < limit ->
          {:gb_trees.insert(key, item, tree), key, size + 1}

        _key ->
          {tree, smallest, size}
    |> elem(0)
    |> :gb_trees.to_list()
    |> Enum.reverse()
    |>, 1))

Carrying the smallest and size parameters myself is a bit of an optimization: we could call :gb_trees.smallest and :gb_trees.size ourselves but we can track this relatively simply. The following pipes are mainly used to extract the tree, reverse the order from ascending to descending sort, and then map to the actual items we cared about.

If this proved useful to you, or you found an issue with it, please let me know :-)

  1. Since I needed it this way, the function is hardcoded to do this. I would be happy for any suggestions to make it more generic, although I assume in that case we would need to change the function a bit due to the guards in case having to be known at compile time. ↩︎

reply via email