bpo-28685: Optimize sorted() list.sort() with type-specialized comparisons#582
Merged
rhettinger merged 50 commits intopython:masterfrom Jan 29, 2018
Merged
bpo-28685: Optimize sorted() list.sort() with type-specialized comparisons#582rhettinger merged 50 commits intopython:masterfrom
rhettinger merged 50 commits intopython:masterfrom
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of the optimization (see also this poster)
The idea is simple: in practice, it's very uncommon to sort type-heterogeneous lists. This is because lists in general tend to be used in a homogeneous way (if you're iterating and the type is changing, your code may break, depending on what you're doing), and because comparison is often not defined in the heterogeneous context ("apples and oranges").
So, instead of checking types during every single compare in the sort (dynamic dispatch), we can simply iterate once in a pre-sort check and see if the list is type-homogeneous. If it is, we can replace
PyObject_RichCompareBoolwith whatever compare function would have ended up being dispatched for that type. Since this check is cheap and very unlikely to fail, and checking types every time we compare is expensive, this is a reasonable optimization to consider.This is, however, only the beginning of what's possible. Namely, there are many safety checks that have to be performed during every compare in the common cases (string, int, float, tuple) that one encounters in practice. For example, character width has to be checked for both strings every time two strings are compared. Since these checks almost never fail in practice (because, e.g., non-latin strings are uncommon in practice, etc.), we can move them out of the comparison function and into the pre-sort check, as well. We then write special-case compare functions (I implemented one for each of the four types mentioned above) that are selected iff. the assumptions necessary to use them are verified for each list element.
Benchmarks
I considered two sets of benchmarks: one organized by type (random lists of that type), and one organized by structure. Full benchmark scripts can be found here. The results are below (standard deviations were less than 0.3% of the mean for all measurements):
By type
By structure
These are just the benchmarks described in
Objects/listsort.txt. The first table is the loss we experience if we sort structured heterogeneous lists (worst case: list is already sorted, we go all the way through doingntype-checks, and then we only end up doingncomparisons. Tragic, but extremely unlikely in practice; in practice, we would usually find the first heterogeneous element early on, and break out of the check, but here, the single, lonelyfloatis hiding all the way at the end of the list ofint, so we don't find it until we've done allnchecks):The second table is the same benchmark, but on homogeneous lists (int):
Patch summary
Here we describe at a high level what each section of the patch does:
Objects/listobject.cPyObject_RichCompareBool. To be selected if all of our pre-checks fail.ob_type->tp_richcompare, which is stored by the pre-sort check atcompare_funcs.key_richcompare. This yields modest optimization (neighbourhood of 10%), but we generally hope we can do better.memcmps them.PyFloat_Type->tp_richcomparedoes a lot of typechecking that we want to move out of the sort loop, it pays to have this optimized compare available.compare_funcs.key_comparein the pre-sort check, we run the pre-sort check again on the listT = [x[0] for x in L](we don't actually run the check twice, but we do something functionally equivalent to this). IfTis type-homogeneous, or even better, satisfies the requirements for one of our special-case compares, we can replace the call toPyObject_RichCompareBoolfor the first tuple element with a call tocompare_funcs.tuple_elem_compare. This allows us to bypass two levels of wasteful safety checks. If the first elements of the two tuples are equal, of course, we have to callPyObject_RichCompareBoolon subsequent elements; the idea is that this is uncommon in practice.key_type,keys_are_all_same_type,ints_are_bounded,strings_are_latin, andkeys_are_in_tuples(which is 1 iff. every list element is a non-empty tuple, in which case all the other variables refer to the list[x[0] for x in L]).keys_are_in_tuplesandkey_type != &PyTuple_Type, then use the other variables to selectcompare_funcs.tuple_elem_compare, and setcompare_funcs.key_compare = unsafe_tuple_compare.Selected quotes from the python-ideas thread
Terry Reedy:
Tim Peters:
Later in that message, Tim also pointed out a bug, which has been fixed in this version of the patch.
https://bugs.python.org/issue28685