Excel User Group
Microsoft Excel blogs, forums, files. Read, ask questions, provide answers.

Array transfer fails on large sheet

Latest post Thu, Jan 24 2008 11:14 AM by zfraile. 6 replies.
  • Mon, Jan 21 2008 3:10 PM

    • zfraile
    • Top 25 Contributor
    • Joined on Wed, Jan 9 2008
    • Posts 37
    • Points 777

    Array transfer fails on large sheet

    I have this macro I use to find duplicate rows in a data table based on user selected criteria.   It basically concatenates the items that would form a unique identifier, loads those into memory, and does an item-by-item comparison, marking and numbering any items that happen to match.  The code seems to have survived the transition to 2007 well enough, but I just found that I can't make it run on large datasets, above the old 65,536 limit.  For some reason it seems skip the line that puts the results back on the worksheet at the end.  I've attached all the code below, but I think it's only the one line in bold that is giving me a problem.  Is there any way around this other than doing a slow cell-by-cell tranfer to the worksheet.

    Sub DuplicatesMultiCriteria(control As IRibbonControl)
    Dim rngList As Range
    Dim rngList1 As Range
    Dim rngCriteria() As Range
    Dim rng As Range
    Dim boolDone As Boolean
    Dim boolValid As Boolean
    Dim strFormula As String
    Dim strMatchValue As String
    Dim l As Long
    Dim l2 As Long
    Dim lMatchNum As Long
    Dim lMatchIndex As Long
    Dim strMatchList() As String
    Dim dblCalc As Double
       
       
        'get list
        On Error Resume Next
        Set rngList = Application.InputBox("Select the list, including the header row (you must have a header)", "List Selection", Selection.Address, Type:=8)
        If Err <> 0 Then Exit Sub
        On Error GoTo 0
       
        'check for room on worksheet
        If ActiveSheet.UsedRange.Cells(ActiveSheet.UsedRange.Cells.Count).Column > Columns(Columns.Count).Column - 2 Then
            MsgBox "In order to find duplicates, two columns must be inserted into the worksheet, but there is not enough room on this worksheet for those columns to be inserted.  Please reorganize or delete columns on this worksheet so that there are at least two blank columns at the end", vbCritical, "Error"
            Exit Sub
        End If
        
        dblCalc = Application.Calculation
        Application.Calculation = xlCalculationManual
        rngList.Select
       
        'get criteria
        Do Until boolDone
            ReDim Preserve rngCriteria(l)
            Do
                boolValid = True
                On Error Resume Next
                If UBound(rngCriteria) = 0 Then
                    Set rngCriteria(l) = Application.InputBox("Choose a criterion or click cancel to quit.", "Criterion Selection", Selection.Address, Type:=8)
                    If Err <> 0 Then Exit Sub
                Else
                    Set rngCriteria(l) = Application.InputBox("Choose another criterion or click cancel to begin duplicate search.", "Criterion Selection", Selection.Address, Type:=8)
                End If
                If Err <> 0 Then
                    boolDone = True
                    ReDim Preserve rngCriteria(l - 1)
                    Exit Do
                End If
                On Error GoTo 0
                'check criterion for validity
                If rngCriteria(l).Cells.Count <> 1 Then
                    MsgBox "You must select only one cell as the criterion.  Please choose again", vbCritical, "Invalid Criterion"
                    boolValid = False
                End If
                If Intersect(rngList, rngCriteria(l)) Is Nothing Then
                    MsgBox "You may only select a criterion that is part of the list you selected.  Please choose again", vbCritical, "Invalid Criterion"
                    boolValid = False
                End If
            Loop Until boolValid
            l = l + 1
        Loop
           
        Application.ScreenUpdating = False
        'add match columns
        rngList.Cells(rngList.Cells.Count).Offset(0, 1).EntireColumn.Insert
        rngList.Cells(rngList.Cells.Count).Offset(0, 1).EntireColumn.Insert
       
        'define ranges
        Set rngList1 = Range(Cells(rngList.Cells(1).Row, rngList.Cells(rngList.Cells.Count).Column + 1), Cells(rngList.Cells(rngList.Cells.Count).Row, rngList.Cells(rngList.Cells.Count).Column + 2))
        Set rngList = Union(rngList, rngList1)
       
        'add headers
        rngList1.Cells(1).Value = "Dup Number"
        rngList1.Cells(1).Offset(0, 1).Value = "Dup Index"
       
        'add match formula
        For Each rng In rngList1.Columns(1).Cells
            If rng.Address <> rngList1.Cells(1).Address Then
                strFormula = "="
                For l = LBound(rngCriteria) To UBound(rngCriteria)
                    strFormula = strFormula & Cells(rng.Row, rngCriteria(l).Column).Address
                    If l < UBound(rngCriteria) Then strFormula = strFormula & "&"
                Next
                rng.Formula = strFormula
            End If
        Next
       
        'sort by match formula
        rngList.Sort key1:=rngList1.Cells(1), order1:=xlAscending, key2:=rngList1.Columns(2).Cells(1), order1:=xlAscending, header:=xlYes
       
        'find and number matches
        'transfer to array first for faster matching
        ReDim strMatchList(2, rngList1.Columns(1).Cells.Count - 1)
        l = 0
        For Each rng In rngList1.Columns(1).Cells
            If rng.Address <> rngList1.Columns(1).Cells(1).Address Then
                strMatchList(0, l) = rng.Value
                l = l + 1
            End If
        Next
       
        'find matches
        lMatchNum = 0
        For l = LBound(strMatchList, 2) To UBound(strMatchList, 2)
            lMatchIndex = 1
            boolDone = False
            For l2 = l + 1 To UBound(strMatchList, 2)
                If strMatchList(2, l2) = "" Then
                    If strMatchList(0, l) = strMatchList(0, l2) Then
                        boolDone = True
                        If lMatchIndex = 1 Then
                            lMatchNum = lMatchNum + 1
                            strMatchList(1, l) = lMatchNum
                            strMatchList(2, l) = lMatchIndex
                        End If
                        lMatchIndex = lMatchIndex + 1
                        strMatchList(1, l2) = lMatchNum
                        strMatchList(2, l2) = lMatchIndex
                    Else
                        Exit For
                    End If
                End If
            Next
        Next
       
        'transfer array back to worksheet
        rngList1.Columns(1).Insert
        Union(rngList1.Offset(1, 0), rngList1.Offset(1, -1)) = Application.WorksheetFunction.Transpose(strMatchList)
        rngList1.Columns(0).Delete
       
       
        'sort and format, alert if zero results found
        rngList.Sort key1:=rngList1.Cells(1), order1:=xlAscending, key2:=rngList1.Columns(2).Cells(1), order1:=xlAscending, header:=xlYes
        If Application.WorksheetFunction.CountA(rngList1) = 2 Then
            Application.ScreenUpdating = True
            MsgBox "No duplicates found", vbInformation, "No duplicates"
            rngList1.EntireColumn.Delete
        Else
            rngList1.Columns.AutoFit
            rngList1.NumberFormat = "#,##0"
            Application.ScreenUpdating = True
            rngList1.Cells(1).Activate
        End If
        Application.Calculation = dblCalc
    End Sub
     

    Filed under: ,
    • Post Points: 37
  • Mon, Jan 21 2008 6:09 PM In reply to

    • hrlngrv
    • Top 10 Contributor
    • Joined on Thu, Jan 17 2008
    • somewhere in the western US
    • Posts 67
    • Points 1,055

    Re: Array transfer fails on large sheet

    I don't have Excel 2007, but I think the first thing to test would be whether the problem could be in Transpose. You could see whether the following macro fails.

    Sub test()
      Dim v(1 To 65540) As Long, k As Long
      For k =LBound(v, 1) To UBound(v, 1)
        v(k) = k
      Next k
      Range("A1:A65540").Value = Application.WorksheetFunction.Transpose(v)
    End Sub

    Test whether it worked using the array formula =DEVSQ(A1:A65539-A2:A65540), which will return 0 if it worked, nonzero if it doesn't. If it doesn't work, there wouldn't seem to be any practical alternative to rewriting the problem line to write subranges of no more cells than Transpose can handle.

    • Post Points: 21
  • Tue, Jan 22 2008 9:16 AM In reply to

    • zfraile
    • Top 25 Contributor
    • Joined on Wed, Jan 9 2008
    • Posts 37
    • Points 777

    Re: Array transfer fails on large sheet

    You're right, it fails at the transpose statement with a type mismatch error.  Looking back, I'm sure it would give the same error with my code were it not for the on error resume next statement.  Using your example, I can't even get the array returned to the worksheet unless I size it to 65,536 items or less.

    So is this a bug, an overlooked update to the object model, or just plain indifference?

    Now I'm going to have to find some sort of hack to do this.  Transferring cell by cell is just too slow, and this is far from the only code I have that uses the transpose trick.

    • Post Points: 21
  • Tue, Jan 22 2008 11:03 AM In reply to

    Re: Array transfer fails on large sheet

    zfraile:

    . . . I can't even get the array returned to the worksheet unless I size it to 65,536 items or less.

    ...

    Now I'm going to have to find some sort of hack to do this.  Transferring cell by cell is just too slow, and this is far from the only code I have that uses the transpose trick.

    Don't transfer cell by cell, transfer in 65536-cell chunks. You'll have to load those chunks using nested For loops, but VBA is fairly fast running its own loops. 

    • Post Points: 21
  • Tue, Jan 22 2008 2:17 PM In reply to

    • zfraile
    • Top 25 Contributor
    • Joined on Wed, Jan 9 2008
    • Posts 37
    • Points 777

    Re: Array transfer fails on large sheet

    Good point, I will give that a try.  I'm sure a max of 16 transpose functions can't be that much slower than one.

    • Post Points: 5
  • Thu, Jan 24 2008 4:39 AM In reply to

    Betreft: Array transfer fails on large sheet

    Just a side step here. Couldn't you use advanced filter (unique items) on the column with the concats, copy the visible part of the entire table and paste that below your data? (I'm not sure what you do here)

    Regards, Jan Karel Pieterse www.jkp-ads.com
    • Post Points: 21
  • Thu, Jan 24 2008 11:14 AM In reply to

    • zfraile
    • Top 25 Contributor
    • Joined on Wed, Jan 9 2008
    • Posts 37
    • Points 777

    Re: Betreft: Array transfer fails on large sheet

    That would work if all dups were equal, but that is not my typical scenario.  If it were, I could get by with "remove duplicates" function (though it did take a while to figure out what that confusing dialog box is really asking for).

    An example of the duplicates I need to remove would be something like this:

    Patient Name, Patient ID, Admit Date, Charge Amount, Balance, Last Tranaction Date

    John Doe, 12345, 1/1/07, 1000, 10, 10/1/07

    John Doe, 12345, 1/1/07, 1000, 5, 11/5/07

     

    I would want to flag this as a duplicate based on PatientID + Admit Date + Charge Amount being equal.  However, I can't use the advanced filter or remove duplicates because it would remove the first item in the list, which may or may not be correct.

    In this case, I want to keep the account with the most recent transaction date.  The only way I've been able to handle this is to identify the duplicates and then manually figure out which ones to take out.  Also, I need to keep track of which ones I removed so those go onto another worksheet.  It's not painfully manual since I can add a series of formulas to figure it out from within the subset of duplicate items (which would differ depending on the data set), but it does require some user intervention.

    Also, sometimes I just want to have that big numbered list so I can send the whole thing back to the source and let them figure it out.

    • Post Points: 5
Page 1 of 1 (7 items) | RSS
Copyright Excel User Group and the relevant contributors, 2008. All Trade Marks acknowledged. This site is a peer-to-peer site and NOT affiliated in any way to Microsoft Corp. All rights reserved.