We produce individual text files for new items to go into our catalogue. These are sent to the creative people for laying out on the page. One file contains the Item#, Description, Text and Bullets for one item and is named with that item's number. Of course, for page layout it is good for them to copy and paste the text this way. It looks as below:
12345E
2500PSI 15LPM Petrol-Powered Pressure Washer
This pressure washer, powered by a Honda GX160 engine, will blast away even the most stubborn dirt with ease. Supplied with reinforced rubber hose and lance with four interchangeable nozzles.
* 2500PSI
* 15 LPM
* Comet Axial pump
* Downstream Chemical Injector
* Ship Wt: 68kg
Of course, when it comes to the website, we need a much more formal record/field (row/column) structure. The challenge today was to parse the files (iterating over a folder containing the files), which also have random blank lines into a sensible structure in Excel for uploading onto the web. Whilst not in my job description, it was a challenge I wanted to take as it was a long time since I worked with text files and I don't think I've ever worked with the FileSystem object from the Microsoft Scripting Runtime Library.
Of course, with any parsing job, the data is never straightforward. I've already mentioned the blank lines, but they also wanted the * changed to <li> to create an html bullet and the Ship Wt: line stripped off as that has it's own field. The split was across three rows, containing Item#, Item Title and Web Text.
When coding I always prefer to use early binding, where you manually set a reference via the VBE to the library you want to use, in this case the Microsoft Scripting Runtime. I do this as it gives you the advantage of intellisense if you are not sure of the object, methods, etc. So my code at development time would look like this
Sub ParseFile()
Dim fsys As FileSystemObject, fsfile As File, stream As TextStream
'Set FileSystemObject variables (Early binding)
Set fsys = New FileSystemObject
Set fsfile = fsys.GetFile(sFolderName & sParseFile)
Set stream = fsfile.OpenAsTextStream(1, 0)
End Sub
This way, just typing fsys. in the future code will fire up the properties, methods, etc of the FileSystemObject, very handy when you don't know them. Generally though I find deploying early binding is a pain, as you have to remember to set the reference on each machine before running the code, so at runtime I change the variable types to generic 'Objects' and then use the CreatObject function to create the FileSytemObject as below
Sub ParseFile()
Dim fsys As Object, fsfile As Object, stream As Object
'Set FileSystemObject variables (Late binding)
Set fsys = CreateObject("Scripting.FileSystemObject")
Set fsfile = fsys.GetFile(sFolderName & sParseFile)
Set stream = fsfile.OpenAsTextStream(1, 0)
End Sub
The OpenAsTextStream method of the File object is very nice to use, you can either read or write the file, you can check each character, complete lines or all the text at once. You get AtEndOfLine and AtEndOfStream boolean properties of the stream object so you can process until one and/or both of these properties are true.
Now I'm making no claims for quality of coding, but the below is the full, deployed code that works for the files we produce. In testing too, it parsed the data from 150 txt files into the correct columns and rows in just a few seconds, much faster than the old copy and paste!
Option Explicit
Sub ParseFile()
'**************************************************************************
' Code written by Nick Hodge 12th March 2008
'V1.0 into production same day. versions below.
'V1.1 added error handling and check for file name ending in 'E' (13/3/2007)
'V1.2 added checking for * as first character of line and change it to <li>
'for an html bullet, also strip out line starting Ship Wt. (13/3/2007)
'**************************************************************************
'Navigate to a folder and the code will iterate over any text files taking the first
'line with text in it and assigning it to the strLine1 variable. It does the same for
'second and third lines and then assigns them to columns 1, 2, 3 in a new workbook
'laying them out in proper field/record format for importing to web
'File format should have item number as first line (no matter if blank lines above
'Product title on second line, no matter if blanks above and rest of copy after that
'File and folder name variables
Dim sFullPathFilename As String, sFolderName As String, sParseFile As String
'New workbook and worksheet variables
Dim wb As Workbook, wks As Worksheet
'Counter to count iterations, used in row placement on sheet and counter for postion
'of slash in path name (before filename)
Dim lCounter As Long, lSlashPosition As Long, iwksCount As Integer
'Filesytem object variables to hold file, textstream, etc.
Dim fsys As FileSystemObject, fsfile As File, stream As TextStream
'Variable to hold lines of text for later assignment to worksheet
Dim strLine1 As String, strLine2 As String, strLine3 As String, strTemp As String
'Booleans to stop loop after line 1 found, etc.
Dim bLine1Fnd As Boolean, bLine2Fnd As Boolean
On Error GoTo errHandler
'Get folder to use by clicking on file
sFullPathFilename = Application.GetOpenFilename("Text Files, *.txt", , "Select any file in the folder")
'Check to see selection made
If sFullPathFilename = "False" Then
MsgBox "You must click on any file in the folder", vbOKOnly, "No Folder Selected"
Exit Sub
End If
'Get position of last slash and parse folder from this
lSlashPosition = InStrRev(sFullPathFilename, "\", -1)
If lSlashPosition = 0 Then
MsgBox "We cannot recognise the folder name, exiting procedure", vbOKOnly + vbExclamation, "Error"
Exit Sub
End If
'Open workbook and set variables
Set wb = Workbooks.Add
Set wks = wb.Worksheets("Sheet1")
'Change sheet name
wks.Name = "Data"
'Switch off application events
With Application
.ScreenUpdating = False
.DisplayAlerts = False
'Delete surplus worksheets
With wb
For iwksCount = .Worksheets.Count To 1 Step -1
If .Worksheets(iwksCount).Name <> "Data" Then
.Worksheets(iwksCount).Delete
End If
Next iwksCount
End With
'Set headings and bold
With wks
.Range("A1").Value = "ItemNo"
.Range("B1").Value = "Title"
.Range("C1").Value = "WebText"
.Rows("1:1").Font.Bold = True
End With
sFolderName = Left(sFullPathFilename, lSlashPosition)
'Start iteration of folder using folder name and files ending in E.txt
sParseFile = Dir(sFolderName & "*E.txt")
'Intialise counter to 2 which starts it on second row, below titles
lCounter = 2
'Do for each file Dir function is called recursively and returns "" when no more files match
Do While sParseFile <> ""
'Initialise booleans
bLine1Fnd = False
bLine2Fnd = False
'Set FileSystemObject variables (Late binding)
Set fsys = CreateObject("Scripting.FileSystemObject")
Set fsfile = fsys.GetFile(sFolderName & sParseFile)
Set stream = fsfile.OpenAsTextStream(1, 0)
'Do so long as the endoffile marker is false, sets first line of text
'Accounts for blank lines
Do While stream.AtEndOfStream = False
Do While bLine1Fnd = False
strTemp = stream.ReadLine
If strTemp <> "" Then
strLine1 = strTemp
bLine1Fnd = True
strTemp = ""
End If
Loop
'sets second line of text, accounts for blank lines
Do While bLine2Fnd = False
strTemp = stream.ReadLine
If strTemp <> "" Then
strLine2 = strTemp
bLine2Fnd = True
strTemp = ""
End If
Loop
'Once 1st and 2nd lines are full then take averything else, line by line
strTemp = stream.ReadLine
'Check if line is ship wt., if so discard
If Left(UCase(strTemp), 6) = "* SHIP" Then strTemp = ""
'Build 3rd lines changing * to <li> for html bullet, else just add the next line to the copy
If strTemp <> "" Then
If Left(strTemp, 1) = "*" Then
strTemp = "<li>" & Mid(strTemp, 2, Len(strTemp) - 1)
strLine3 = strLine3 & strTemp & Chr(10)
Else
strLine3 = strLine3 & strTemp & Chr(10)
End If
End If
Loop
'Close stream and release object variables
stream.Close
Set stream = Nothing
Set fsfile = Nothing
Set fsys = Nothing
'Assign the strings to worksheet
With wks
.Cells(lCounter, 1).Value = strLine1
.Cells(lCounter, 2).Value = strLine2
.Cells(lCounter, 3).Value = Left(strLine3, Len(strLine3) - 1)
.Columns.AutoFit
.Columns(3).ColumnWidth = 200
.Rows.AutoFit
End With
'Clear strings for next iteration
strLine1 = ""
strLine2 = ""
strLine3 = ""
'Increment counter so text is placed on next row
lCounter = lCounter + 1
'Start next iteration of files in folder
sParseFile = Dir()
Loop
'Turn application events back on
.ScreenUpdating = True
.DisplayAlerts = True
End With
MsgBox " Successfully parsed " & lCounter - 2 & " *.txt files" & Chr(13) & "From " & sFolderName, _
vbInformation + vbOKOnly, "Parsed Successfully"
Exit Sub
'Error handling routine to exit gracefully
errHandler:
MsgBox "There has been an error, please advise add-in creator" & Chr(13) & _
"The routine failed on: " & sParseFile, vbOKOnly + vbCritical
If Not wb Is Nothing Then
wb.Close SaveChanges:=False
End If
With Application
.DisplayAlerts = True
.ScreenUpdating = True
End With
Exit Sub
End Sub
If you want to download and try the file (Excel 2007 Only) I have attached the working xlsm file to this post (It is actually deployed as an xlam add-in) and 20 demo text files. Just place the text files in a folder and then go to the Marketing tab (Yes, I even wrote a ribbon customisation for it!) and press the button. It will ask you to navigate to any file in the folder and the will take each file in turn using the Dir() function and parse the text into a new Excel workbook.
Posted
Mar 13 2008, 08:35 PM
by
Nick Hodge