arrays - Performance Considerations when using VBA Filter Function -
i can't figure out how filter function works fast. have used filter on sorts of data , regardless of data-type, filter obliterates alternative method employ. regularly use binary search algorithm , quickarraysort algorithm written stephen bullen (found in professional excel development). binary search lightning fast (as fast filter function, given array sorted) , quick sort algorithm 1 of fastest sorting algorithms known.
i have written test code below comparing speeds of finding random element in large array (size = 2,000,000). intentionally populate array in un-ordered fashion (it should noted have tried various un-ordered assigning methods, , results similar regardless of assigning method).
sub searchtest() dim long, strmyarray() string, lngsize long, strtest string dim timebinarysearch long, timefiltersearch long dim lngresultbinary long, lngresultfilter long dim starthour long, startminute long, startsecond long dim startmilisecond long, starttime long dim endhour long, endminute long, endsecond long dim endmilisecond long, endtime long lngsize = 2000000 strtest = cstr(1735674 * 987) redim strmyarray(lngsize) = 1 ubound(strmyarray) if mod 2 = 0 strmyarray(i) = cstr((i - 1) * 987) else strmyarray(i) = cstr((i + 1) * 987) end if next ''filter test '******************************************************************* starthour = hour(now()) * 60 * 60 * 1000 startminute = minute(now()) * 60 * 1000 startsecond = second(now()) * 1000 startmilisecond = format(now(), "ms") starttime = starthour + startminute + startsecond + startmilisecond lngresultfilter = clng(filter(strmyarray, strtest)(0)) endhour = hour(now()) * 60 * 60 * 1000 endminute = minute(now()) * 60 * 1000 endsecond = second(now()) * 1000 endmilisecond = format(now(), "ms") endtime = endhour + endminute + endsecond + endmilisecond timefiltersearch = endtime - starttime '******************************************************************* ''binary test '******************************************************************* starthour = hour(now()) * 60 * 60 * 1000 startminute = minute(now()) * 60 * 1000 startsecond = second(now()) * 1000 startmilisecond = format(now(), "ms") starttime = starthour + startminute + startsecond + startmilisecond quicksortstring1d strmyarray lngresultbinary = strmyarray(clng(binarysearchstring(strtest, strmyarray))) endhour = hour(now()) * 60 * 60 * 1000 endminute = minute(now()) * 60 * 1000 endsecond = second(now()) * 1000 endmilisecond = format(now(), "ms") endtime = endhour + endminute + endsecond + endmilisecond timebinarysearch = endtime - starttime '******************************************************************* msgbox lngresultfilter & vbcr & vbcr & lngresultbinary msgbox timefiltersearch & vbcr & vbcr & timebinarysearch end sub
both methods return same result, filter method's return time 0 ms , quicksort/binarysearch method's return time 20 seconds. huge difference!! mentioned earlier, if array sorted binary search method returns 0 ms (as know, binary search requires array sorted begin with)
so, how can filter function through 2,000,000 un-sorted entries , find correct result immediately? can't loop through every entry , compare filtervalue (this far slowest method), based off of these preliminary test, can't utilizing binary search either, have sort array first. if there awesome sorting algorithm compiled, find hard believe sort array of size greater million instantaneously.
by way, below quicksort algorithm , binary search algorithm.
sub quicksortstring1d(byref saarray() string, _ optional byval bsortascending boolean = true, _ optional byval llow1 variant, _ optional byval lhigh1 variant) 'dimension variables dim llow2 long dim lhigh2 long dim skey string dim sswap string on error goto errorexit 'if not provided, sort entire array if ismissing(llow1) llow1 = lbound(saarray) if ismissing(lhigh1) lhigh1 = ubound(saarray) 'set new extremes old extremes llow2 = llow1 lhigh2 = lhigh1 'get value of array item in middle of new extremes skey = saarray((llow1 + lhigh1) \ 2) 'loop items in array between extremes while llow2 < lhigh2 if bsortascending 'find first item greater mid-point item while saarray(llow2) < skey , llow2 < lhigh1 llow2 = llow2 + 1 loop 'find last item less mid-point item while saarray(lhigh2) > skey , lhigh2 > llow1 lhigh2 = lhigh2 - 1 loop else 'find first item less mid-point item while saarray(llow2) > skey , llow2 < lhigh1 llow2 = llow2 + 1 loop 'find last item greater mid-point item while saarray(lhigh2) < skey , lhigh2 > llow1 lhigh2 = lhigh2 - 1 loop end if 'if 2 items in wrong order, swap rows if llow2 < lhigh2 sswap = saarray(llow2) saarray(llow2) = saarray(lhigh2) saarray(lhigh2) = sswap end if 'if pointers not together, advance next item if llow2 <= lhigh2 llow2 = llow2 + 1 lhigh2 = lhigh2 - 1 end if loop 'recurse sort lower half of extremes if lhigh2 > llow1 quicksortstring1d saarray, bsortascending, llow1, lhigh2 end if 'recurse sort upper half of extremes if llow2 < lhigh1 quicksortstring1d saarray, bsortascending, llow2, lhigh1 end if errorexit: end sub '*********************************************************** ' comments: uses binary search algorithm locate ' string within sorted array of strings ' ' arguments: slookfor string search in array ' saarray array of strings, sorted ascending ' lmethod either vbbinarycompare or vbtextcompare ' defaults vbtextcompare ' lnotfound value return if text isn’t ' found. defaults -1 ' ' returns: long located position in array, ' or lnotfound if not found ' ' date developer action ' ———————————————————————————————- ' 02 jun 04 stephen bullen created ' function binarysearchstring(byref slookfor string, _ byref saarray() string, _ optional byval lmethod vbcomparemethod = vbtextcompare, _ optional byval lnotfound long = -1) long dim llow long dim lmid long dim lhigh long dim lcomp long on error goto errorexit 'assume didn’t find binarysearchstring = lnotfound 'get starting positions llow = lbound(saarray) lhigh = ubound(saarray) 'find midpoint of array lmid = (llow + lhigh) \ 2 'compare mid-point element string being searched lcomp = strcomp(saarray(lmid), slookfor, lmethod) if lcomp = 0 'we found it, return location , quit binarysearchstring = lmid exit elseif lcomp = 1 'the midpoint item bigger - throw away top half lhigh = lmid - 1 else 'the midpoint item smaller - throw away bottom half llow = lmid + 1 end if 'continue until our pointers cross loop until llow > lhigh errorexit: end function
edit: seems should have done "brute" force tests first. looping through array in linear fashion john coleman suggests filter function performs, return time same scenario above 0 ms. see below:
sub test3() dim long, strmyarray() string, lngsize long, strtest string dim lngresultbrute long, timebrutesearch long lngsize = 2000000 strtest = cstr(936740 * 97) redim strmyarray(lngsize) = 1 ubound(strmyarray) if mod 2 = 0 strmyarray(i) = cstr((i - 1) * 97) else strmyarray(i) = cstr((i + 1) * 97) end if next starttime = timer ' brute force search = 1 ubound(strmyarray) if strmyarray(i) = strtest lngresultbrute = clng(strtest) exit end if next endtime = timer timebrutesearch = endtime - starttime msgbox timebrutesearch end sub
filter
use linear search -- executes lightening quick because implemented in highly optimized c/c++ code. see this, run following code:
function randstring(n long) string 'returns random string in b-z dim long dim s string = 1 n s = s & chr(66 + int(25 * rnd())) next randstring = s end function sub test() dim times(1 20) double dim long, n long dim a() string dim start double dim s string randomize s = randstring(99) redim a(1 2000000) = 1 2000000 a(i) = s + randstring(1) next s = s & "a" = 20 1 step -1 n = * 100000 redim preserve a(1 n) start = timer debug.print ubound(filter(a, s)) 'should -1 times(i) = timer - start next = 1 20 cells(i, 1) = cells(i, 2) = times(i) next end sub
this code creates array of 2,000,000 random strings of length 100, each of differs target string in last position. feeds subarrays sizes multiples of 100,000 filter
, timing time takes. output looks this:
the clear linear trend doesn't prove strong evidence vba's filter
executing straightforward linear search.
Comments
Post a Comment