Quantcast
Channel: .NET Framework Class Libraries forum
Viewing all articles
Browse latest Browse all 8156

Array.Sort performance issue

$
0
0
 

Using Array.Sort with a Comparison(of T) delegate, I noticed some impossible equality outcomes. In other words, I knew the input array contained no pair of elements that compare equally. So, I made an array of distinct integers a() with a(i)=i for i ranging 0..ubound(a). I randomly shuffled the array and called Array.Sort with the array and my comparer. Upon completion, I verified that the array was back in its original unshuffled form with a(i)=i. Next, in the comparer, I debug.asserted that input arguments were different, and found that Array.Sort often called the comparer passing the same value in arg 1 and arg 2. Since all values in a() were distinct integers, Array.Sort must be comparing the ith to the ith element. Looking further, I found that the same pair of distinct elements were often compared multiple times.

When sorting 1000 elements, 10% of all compares were comparing an item to itself, and 20% were comparing the same distinct pair multiple times. So approximately 30% of all compares called by Array.Sort are suspicious. For smaller array sizes, the percentages go up, and vice versa.

To me, this appears inefficient at best and buggy at worst. I don't think buggy because Array.Sort has never failed for me. I can understand some of the 30% if Array.Sort implements QuickSort using so-called guard elements to eliminate bounds checking during an iterative operation. But 30%? I wrote a naive median-1 QuickSort, used the same comparer and saw no such anomalies. An Item was never compared itself, and different items were compared at most once.

In addition, when sorting n distinct elements initially in random order, the theoretical minimum number of comparisons required is log2(n!). The total number of comparisons actually made divided by log2(n!) is thus an efficiency measure, lower and approaching 1 being better. A median-5 QuickSort scores about 1.1, and a naive median-1 QuickSort scores 1.3 to 1.4. But Array.Sort scores 1.5 to 1.7.

So, what's up with Array.Sort? Can anyone shed any light?

My VB test class ArraySortIssue is enclosed. Instantiate z as a New ArraySortIssue object and look at the string returned by z.Test. Comment in the assert statement in the comparer to see an issue as it occurs (in this case, life is easier in the debugger with a very small array).

Public Class ArraySortIssue ' Array.Sort performance test
  ' prove Array.Sort compares elements to themselves
  ' prove Array.Sort compares the same pair of elements too many times
  Public a() As Integer ' array to be sorted, distinct integers 0..ubound(a) in random order
  Public t(,) As UShort ' tallies of values compared
  Public Function CompareInt(ByVal x As Integer, ByVal y As Integer) As Integer
    ' for overload 'Public Shared Sub Sort(Of T)(array As T(), comparison As Comparison(Of T))'
    t(x, y) = CUShort(t(x, y) + 1) ' passing this means 0<=i<=ubound and 0<=j<=ubound
    'Debug.Assert((x <> y) And (t(x, y) <= 1)) ' comment this in to see each compare oddity
    Return x - y
  End Function
  Public Sub Swap(ByVal i As Integer, ByVal j As Integer)
    ' swap a(i) with a(j)
    Dim k As Integer = a(i) : a(i) = a(j) : a(j) = k
  End Sub
  Public Sub NaiveQuickSort(ByVal f As Integer, ByVal l As Integer)
    ' Naive Median-1 Quicksort a(f..l), yields no anomalous comparisons
    If f < l Then
      Dim i As Integer = f + 1
      Dim j As Integer = l
      While i <= j
        If CompareInt(a(f), a(i)) < 0 Then Swap(i, j) : j -= 1 Else i += 1
      End While
      Swap(f, j)
      NaiveQuickSort(f, j - 1)
      NaiveQuickSort(j + 1, l)
    End If
  End Sub
  Public Function Test() As String
    ' test driver
    Dim MyRandom As New System.Random ' random numbers to generate a random permutation
    Dim n As Integer = MyRandom.Next(2, 9)
    n = 100
    Dim u As Integer = n - 1
    ReDim a(u)
    ReDim t(u, u) ' don't make n too big
    Dim i, j, k As Integer
    For i = 0 To u : a(i) = i : Next ' integers 0..u
    For i = u To 1 Step -1 ' shuffle
      j = MyRandom.Next(0, i + 1) : k = a(i) : a(i) = a(j) : a(j) = k
    Next
    If True Then ' do one or the other only
      Array.Sort(Of Integer)(a, AddressOf CompareInt) ' sort using Comparison(of Integer)
    Else
      NaiveQuickSort(0, u) ' sort using home grown with same comparer
    End If
    For i = 0 To u : Debug.Assert(a(i) = i) : Next ' verify correctness
    ' reduce t(,) to three scalars
    Dim nGood As Integer = 0 ' i<>j compares counting each at most once
    Dim nEqual As Integer = 0 ' all i=j compares
    Dim nRepeat As Integer = 0 ' excess i<>j compares
    For i = 0 To u
      nEqual += t(i, i) ' all on the diagonal are suspicious
      For j = 0 To i - 1
        k = t(i, j) + t(j, i) ' take i vs j and j vs i together
        If k > 0 Then nGood += 1 : k -= 1 ' one alone or one of many is ok,
        nRepeat += k ' but the remainder are suspicious
      Next
    Next
    ' output string
    Dim nTotal As Integer = nGood + nEqual + nRepeat
    Dim s As String = "Sort Test" & _", n=" & n & _", nTotal=" & nTotal & _", nGood=" & nGood & " " & CInt(nGood * 100 / nTotal) & "%" & _", nEqual=" & nEqual & " " & CInt(nEqual * 100 / nTotal) & "%" & _", nRepeat=" & nRepeat & " " & CInt(nRepeat * 100 / nTotal) & "%"
#If False Then
    ' depends on the author's Dec() and Log2Factorial() functions
    s &= ", ceff=" & Dec(nTotal / CInt(Log2Factorial(n)), , 6)
#End If
    Return s
  End Function
End Class


Viewing all articles
Browse latest Browse all 8156

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>