arrays - Performance Considerations when using VBA Filter Function -


i can't figure out how filter function works fast. have used filter on sorts of data , regardless of data-type, filter obliterates alternative method employ. regularly use binary search algorithm , quickarraysort algorithm written stephen bullen (found in professional excel development). binary search lightning fast (as fast filter function, given array sorted) , quick sort algorithm 1 of fastest sorting algorithms known.

i have written test code below comparing speeds of finding random element in large array (size = 2,000,000). intentionally populate array in un-ordered fashion (it should noted have tried various un-ordered assigning methods, , results similar regardless of assigning method).

sub searchtest()  dim long, strmyarray() string, lngsize long, strtest string dim timebinarysearch long, timefiltersearch long dim lngresultbinary long, lngresultfilter long  dim starthour long, startminute long, startsecond long dim startmilisecond long, starttime long  dim endhour long, endminute  long, endsecond long dim endmilisecond long, endtime long      lngsize = 2000000      strtest = cstr(1735674 * 987)      redim strmyarray(lngsize)      = 1 ubound(strmyarray)         if mod 2 = 0             strmyarray(i) = cstr((i - 1) * 987)         else             strmyarray(i) = cstr((i + 1) * 987)         end if     next  ''filter test '*******************************************************************     starthour = hour(now()) * 60 * 60 * 1000     startminute = minute(now()) * 60 * 1000     startsecond = second(now()) * 1000     startmilisecond = format(now(), "ms")      starttime = starthour + startminute + startsecond + startmilisecond      lngresultfilter = clng(filter(strmyarray, strtest)(0))      endhour = hour(now()) * 60 * 60 * 1000     endminute = minute(now()) * 60 * 1000     endsecond = second(now()) * 1000     endmilisecond = format(now(), "ms")      endtime = endhour + endminute + endsecond + endmilisecond      timefiltersearch = endtime - starttime '*******************************************************************  ''binary test '*******************************************************************     starthour = hour(now()) * 60 * 60 * 1000     startminute = minute(now()) * 60 * 1000     startsecond = second(now()) * 1000     startmilisecond = format(now(), "ms")      starttime = starthour + startminute + startsecond + startmilisecond      quicksortstring1d strmyarray      lngresultbinary = strmyarray(clng(binarysearchstring(strtest, strmyarray)))      endhour = hour(now()) * 60 * 60 * 1000     endminute = minute(now()) * 60 * 1000     endsecond = second(now()) * 1000     endmilisecond = format(now(), "ms")      endtime = endhour + endminute + endsecond + endmilisecond      timebinarysearch = endtime - starttime '*******************************************************************      msgbox lngresultfilter & vbcr & vbcr & lngresultbinary       msgbox timefiltersearch & vbcr & vbcr & timebinarysearch  end sub 

both methods return same result, filter method's return time 0 ms , quicksort/binarysearch method's return time 20 seconds. huge difference!! mentioned earlier, if array sorted binary search method returns 0 ms (as know, binary search requires array sorted begin with)

so, how can filter function through 2,000,000 un-sorted entries , find correct result immediately? can't loop through every entry , compare filtervalue (this far slowest method), based off of these preliminary test, can't utilizing binary search either, have sort array first. if there awesome sorting algorithm compiled, find hard believe sort array of size greater million instantaneously.

by way, below quicksort algorithm , binary search algorithm.

    sub quicksortstring1d(byref saarray() string, _                 optional byval bsortascending boolean = true, _                 optional byval llow1 variant, _                 optional byval lhigh1 variant)      'dimension variables     dim llow2 long     dim lhigh2 long     dim skey string     dim sswap string          on error goto errorexit         'if not provided, sort entire array         if ismissing(llow1) llow1 = lbound(saarray)         if ismissing(lhigh1) lhigh1 = ubound(saarray)          'set new extremes old extremes         llow2 = llow1         lhigh2 = lhigh1          'get value of array item in middle of new extremes         skey = saarray((llow1 + lhigh1) \ 2)          'loop items in array between extremes         while llow2 < lhigh2              if bsortascending                 'find first item greater mid-point item                 while saarray(llow2) < skey , llow2 < lhigh1                     llow2 = llow2 + 1                 loop                  'find last item less mid-point item                 while saarray(lhigh2) > skey , lhigh2 > llow1                     lhigh2 = lhigh2 - 1                 loop             else                 'find first item less mid-point item                 while saarray(llow2) > skey , llow2 < lhigh1                     llow2 = llow2 + 1                 loop                  'find last item greater mid-point item                 while saarray(lhigh2) < skey , lhigh2 > llow1                     lhigh2 = lhigh2 - 1                 loop              end if              'if 2 items in wrong order, swap rows             if llow2 < lhigh2                 sswap = saarray(llow2)                 saarray(llow2) = saarray(lhigh2)                 saarray(lhigh2) = sswap             end if              'if pointers not together, advance next item             if llow2 <= lhigh2                 llow2 = llow2 + 1                 lhigh2 = lhigh2 - 1             end if         loop          'recurse sort lower half of extremes         if lhigh2 > llow1             quicksortstring1d saarray, bsortascending, llow1, lhigh2         end if          'recurse sort upper half of extremes         if llow2 < lhigh1             quicksortstring1d saarray, bsortascending, llow2, lhigh1         end if      errorexit:      end sub      '***********************************************************     ' comments: uses binary search algorithm locate     ' string within sorted array of strings     '     ' arguments: slookfor string search in array     ' saarray array of strings, sorted ascending     ' lmethod either vbbinarycompare or vbtextcompare     ' defaults vbtextcompare     ' lnotfound value return if text isn’t     ' found. defaults -1     '     ' returns: long located position in array,     ' or lnotfound if not found     '     ' date developer action     ' ———————————————————————————————-     ' 02 jun 04 stephen bullen created     '     function binarysearchstring(byref slookfor string, _                 byref saarray() string, _                 optional byval lmethod vbcomparemethod = vbtextcompare, _                 optional byval lnotfound long = -1) long      dim llow long     dim lmid long     dim lhigh long     dim lcomp long          on error goto errorexit          'assume didn’t find         binarysearchstring = lnotfound          'get starting positions         llow = lbound(saarray)         lhigh = ubound(saarray)                      'find midpoint of array             lmid = (llow + lhigh) \ 2              'compare mid-point element string being searched             lcomp = strcomp(saarray(lmid), slookfor, lmethod)              if lcomp = 0                 'we found it, return location , quit                 binarysearchstring = lmid                 exit             elseif lcomp = 1                 'the midpoint item bigger - throw away top half                 lhigh = lmid - 1             else                 'the midpoint item smaller - throw away bottom half                 llow = lmid + 1             end if              'continue until our pointers cross         loop until llow > lhigh      errorexit:      end function 

edit: seems should have done "brute" force tests first. looping through array in linear fashion john coleman suggests filter function performs, return time same scenario above 0 ms. see below:

sub test3()  dim long, strmyarray() string, lngsize long, strtest string dim lngresultbrute long, timebrutesearch long      lngsize = 2000000     strtest = cstr(936740 * 97)     redim strmyarray(lngsize)      = 1 ubound(strmyarray)         if mod 2 = 0             strmyarray(i) = cstr((i - 1) * 97)         else             strmyarray(i) = cstr((i + 1) * 97)         end if     next      starttime = timer      ' brute force search     = 1 ubound(strmyarray)         if strmyarray(i) = strtest             lngresultbrute = clng(strtest)             exit         end if     next      endtime = timer      timebrutesearch = endtime - starttime     msgbox timebrutesearch  end sub 

filter use linear search -- executes lightening quick because implemented in highly optimized c/c++ code. see this, run following code:

function randstring(n long) string     'returns random string in b-z     dim long     dim s string     = 1 n         s = s & chr(66 + int(25 * rnd()))     next     randstring = s end function  sub test()     dim times(1 20) double     dim long, n long     dim a() string     dim start double     dim s string     randomize     s = randstring(99)     redim a(1 2000000)     = 1 2000000         a(i) = s + randstring(1)     next     s = s & "a"     = 20 1 step -1         n = * 100000         redim preserve a(1 n)         start = timer         debug.print ubound(filter(a, s)) 'should -1         times(i) = timer - start     next     = 1 20         cells(i, 1) =         cells(i, 2) = times(i)     next end sub 

this code creates array of 2,000,000 random strings of length 100, each of differs target string in last position. feeds subarrays sizes multiples of 100,000 filter, timing time takes. output looks this:

enter image description here

the clear linear trend doesn't prove strong evidence vba's filter executing straightforward linear search.


Comments

Popular posts from this blog

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -

java - Android – MapFragment overlay button shadow, just like MyLocation button -