Excel User Group
Microsoft Excel blogs, forums, files. Read, ask questions, provide answers.

Playing with OpenAsTextStream

We produce individual text files for new items to go into our catalogue. These are sent to the creative people for laying out on the page. One file contains the Item#, Description, Text and Bullets for one item and is named with that item's number. Of course, for page layout it is good for them to copy and paste the text this way. It looks as below:

12345E

2500PSI 15LPM Petrol-Powered Pressure Washer

This pressure washer, powered by a Honda GX160 engine, will blast away even the most stubborn dirt with ease. Supplied with reinforced rubber hose and lance with four interchangeable nozzles.
* 2500PSI
* 15 LPM
* Comet Axial pump
* Downstream Chemical Injector
* Ship Wt: 68kg

Of course, when it comes to the website, we need a much more formal record/field (row/column) structure. The challenge today was to parse the files (iterating over a folder containing the files), which also have random blank lines into a sensible structure in Excel for uploading onto the web. Whilst not in my job description, it was a challenge I wanted to take as it was a long time since I worked with text files and I don't think I've ever worked with the FileSystem object from the Microsoft Scripting Runtime Library.

Of course, with any parsing job, the data is never straightforward. I've already mentioned the blank lines, but they also wanted the * changed to <li> to create an html bullet and the Ship Wt: line stripped off as that has it's own field. The split was across three rows, containing Item#, Item Title and Web Text.

When coding I always prefer to use early binding, where you manually set a reference via the VBE to the library you want to use, in this case the Microsoft Scripting Runtime. I do this as it gives you the advantage of intellisense if you are not sure of the object, methods, etc. So my code at development time would look like this

Sub ParseFile()
Dim fsys As FileSystemObject, fsfile As File, stream As TextStream

    'Set FileSystemObject variables (Early binding)
    Set fsys = New FileSystemObject
    Set fsfile = fsys.GetFile(sFolderName & sParseFile)
    Set stream = fsfile.OpenAsTextStream(1, 0)

End Sub

This way, just typing fsys. in the future code will fire up the properties, methods, etc of the FileSystemObject, very handy when you don't know them. Generally though I find deploying early binding is a pain, as you have to remember to set the reference on each machine before running the code, so at runtime I change the variable types to generic 'Objects' and then use the CreatObject function to create the FileSytemObject as below

Sub ParseFile()
Dim fsys As Object, fsfile As Object, stream As Object

    'Set FileSystemObject variables (Late binding)
    Set fsys = CreateObject("Scripting.FileSystemObject")
    Set fsfile = fsys.GetFile(sFolderName & sParseFile)
    Set stream = fsfile.OpenAsTextStream(1, 0)

End Sub

The OpenAsTextStream method of the File object is very nice to use, you can either read or write the file, you can check each character, complete lines or all the text at once. You get AtEndOfLine and AtEndOfStream boolean properties of the stream object so you can process until one and/or both of these properties are true.

Now I'm making no claims for quality of coding, but the below is the full, deployed code that works for the files we produce. In testing too, it parsed the data from 150 txt files into the correct columns and rows in just a few seconds, much faster than the old copy and paste!

Option Explicit

Sub ParseFile()
'**************************************************************************
' Code written by Nick Hodge 12th March 2008
'V1.0 into production same day. versions below.
'V1.1 added error handling and check for file name ending in 'E' (13/3/2007)
'V1.2 added checking for * as first character of line and change it to <li>
'for an html bullet, also strip out line starting Ship Wt. (13/3/2007)
'**************************************************************************

'Navigate to a folder and the code will iterate over any text files taking the first
'line with text in it and assigning it to the strLine1 variable. It does the same for
'second and third lines and then assigns them to columns 1, 2, 3 in a new workbook
'laying them out in proper field/record format for importing to web
'File format should have item number as first line (no matter if blank lines above
'Product title on second line, no matter if blanks above and rest of copy after that

'File and folder name variables
Dim sFullPathFilename As String, sFolderName As String, sParseFile As String
'New workbook and worksheet variables
Dim wb As Workbook, wks As Worksheet
'Counter to count iterations, used in row placement on sheet and counter for postion
'of slash in path name (before filename)
Dim lCounter As Long, lSlashPosition As Long, iwksCount As Integer
'Filesytem object variables to hold file, textstream, etc.
Dim fsys As FileSystemObject, fsfile As File, stream As TextStream
'Variable to hold lines of text for later assignment to worksheet
Dim strLine1 As String, strLine2 As String, strLine3 As String, strTemp As String
'Booleans to stop loop after line 1 found, etc.
Dim bLine1Fnd As Boolean, bLine2Fnd As Boolean

On Error GoTo errHandler

'Get folder to use by clicking on file
sFullPathFilename = Application.GetOpenFilename("Text Files, *.txt", , "Select any file in the folder")

'Check to see selection made
If sFullPathFilename = "False" Then
    MsgBox "You must click on any file in the folder", vbOKOnly, "No Folder Selected"
    Exit Sub
End If

'Get position of last slash and parse folder from this
lSlashPosition = InStrRev(sFullPathFilename, "\", -1)
If lSlashPosition = 0 Then
    MsgBox "We cannot recognise the folder name, exiting procedure", vbOKOnly + vbExclamation, "Error"
    Exit Sub
End If

'Open workbook and set variables
Set wb = Workbooks.Add
Set wks = wb.Worksheets("Sheet1")

'Change sheet name
wks.Name = "Data"

'Switch off application events
With Application
    .ScreenUpdating = False
    .DisplayAlerts = False
   
'Delete surplus worksheets
With wb
    For iwksCount = .Worksheets.Count To 1 Step -1
        If .Worksheets(iwksCount).Name <> "Data" Then
        .Worksheets(iwksCount).Delete
        End If
    Next iwksCount
End With

'Set headings and bold
With wks
    .Range("A1").Value = "ItemNo"
    .Range("B1").Value = "Title"
    .Range("C1").Value = "WebText"
    .Rows("1:1").Font.Bold = True
End With

sFolderName = Left(sFullPathFilename, lSlashPosition)

'Start iteration of folder using folder name and files ending in E.txt
sParseFile = Dir(sFolderName & "*E.txt")

'Intialise counter to 2 which starts it on second row, below titles
lCounter = 2

'Do for each file Dir function is called recursively and returns "" when no more files match
Do While sParseFile <> ""

'Initialise booleans
bLine1Fnd = False
bLine2Fnd = False

    'Set FileSystemObject variables (Late binding)
    Set fsys = CreateObject("Scripting.FileSystemObject")
    Set fsfile = fsys.GetFile(sFolderName & sParseFile)
    Set stream = fsfile.OpenAsTextStream(1, 0)

    'Do so long as the endoffile marker is false, sets first line of text
    'Accounts for blank lines
    Do While stream.AtEndOfStream = False
            Do While bLine1Fnd = False
            strTemp = stream.ReadLine
                If strTemp <> "" Then
                    strLine1 = strTemp
                    bLine1Fnd = True
                    strTemp = ""
                End If
            Loop
            'sets second line of text, accounts for blank lines
            Do While bLine2Fnd = False
            strTemp = stream.ReadLine
                If strTemp <> "" Then
                    strLine2 = strTemp
                    bLine2Fnd = True
                    strTemp = ""
                End If
            Loop
            'Once 1st and 2nd lines are full then take averything else, line by line
            strTemp = stream.ReadLine
            'Check if line is ship wt., if so discard
                If Left(UCase(strTemp), 6) = "* SHIP" Then strTemp = ""
                'Build 3rd lines changing * to <li> for html bullet, else just add the next line to the copy
                If strTemp <> "" Then
                    If Left(strTemp, 1) = "*" Then
                        strTemp = "<li>" & Mid(strTemp, 2, Len(strTemp) - 1)
                        strLine3 = strLine3 & strTemp & Chr(10)
                    Else
                        strLine3 = strLine3 & strTemp & Chr(10)
                    End If
                End If
                
    Loop
    
    'Close stream and release object variables
    stream.Close

    Set stream = Nothing
    Set fsfile = Nothing
    Set fsys = Nothing
    
    'Assign the strings to worksheet
    With wks
        .Cells(lCounter, 1).Value = strLine1
        .Cells(lCounter, 2).Value = strLine2
        .Cells(lCounter, 3).Value = Left(strLine3, Len(strLine3) - 1)
        .Columns.AutoFit
        .Columns(3).ColumnWidth = 200
        .Rows.AutoFit
    End With
    
    'Clear strings for next iteration
    strLine1 = ""
    strLine2 = ""
    strLine3 = ""
    
    'Increment counter so text is placed on next row
    lCounter = lCounter + 1

    'Start next iteration of files in folder
    sParseFile = Dir()

Loop
    
    'Turn application events back on
    .ScreenUpdating = True
    .DisplayAlerts = True
End With

MsgBox " Successfully parsed " & lCounter - 2 & " *.txt files" & Chr(13) & "From " & sFolderName, _
    vbInformation + vbOKOnly, "Parsed Successfully"

Exit Sub

'Error handling routine to exit gracefully
errHandler:

MsgBox "There has been an error, please advise add-in creator" & Chr(13) & _
    "The routine failed on: " & sParseFile, vbOKOnly + vbCritical

If Not wb Is Nothing Then
    wb.Close SaveChanges:=False
End If

With Application
    .DisplayAlerts = True
    .ScreenUpdating = True
End With

Exit Sub

End Sub

If you want to download and try the file (Excel 2007 Only) I have attached the working xlsm file to this post (It is actually deployed as an xlam add-in) and 20 demo text files. Just place the text files in a folder and then go to the Marketing tab (Yes, I even wrote a ribbon customisation for it!) and press the button. It will ask you to navigate to any file in the folder and the will take each file in turn using the Dir() function and parse the text into a new Excel workbook.


Posted Mar 13 2008, 08:35 PM by Nick Hodge
Attachment: ParseTextFiles.zip

Comments

kaak wrote re: Playing with OpenAsTextStream
on Fri, Mar 14 2008 8:31 AM

What is the avantage of  FileSystem object.

I normally use something like this:

Sub ParseFile

   Dim strLine as String

Open sFolderName & sParseFile For Input As #1

   Do Until EOF(1)

      Line Input #1, strLine

   Loop

   Close #1

End Sub

Rob Bruce wrote re: Playing with OpenAsTextStream
on Fri, Mar 14 2008 12:01 PM

Or even simpler:

'Load a text file in one operation

Function FileText(ByVal filename As String) As String

   Dim handle As Integer

   ' open in binary mode

   handle = FreeFile

   Open filename$ For Binary As #handle

   ' read the string and close the file

   FileText = Space$(LOF(handle))

   Get #handle, , FileText

   Close #handle

End Function

I'd also question whether this is a case where late binding is at all appropriate. Has MS broken binary compatibility with the scripting library? Is it likely to? Are you likely to want to run this on a PC without the Scripting runtime? These questions need to be addressed before assuming that you need always to late bind.

Good post, nevertheless. I've seen people get into all sorts of trouble trying to parse text file by opening them into worksheets.

Nick Hodge wrote re: Playing with OpenAsTextStream
on Fri, Mar 14 2008 3:05 PM

Kaak

Not sure of the advantages of the FileSystem object except it appears to be a much richer model, for example you can get drive letters, size, file system (FAT, NTFS), etc. there are other ways of course but this is very 'VBA' centric.

I agree with Rob's caution, but as I was deploying this internally, i control the scripting runtime install.

Doing my job which is totally not IT, it is sometimes good to learn stuff 'in anger'...good fun.

Copyright Excel User Group and the relevant contributors, 2008. All Trade Marks acknowledged. This site is a peer-to-peer site and NOT affiliated in any way to Microsoft Corp. All rights reserved.