VBS looping

From iMacros
Jump to: navigation, search

Looping through a table that spans over several pages (using VBS script)

This tutorial describes how to extract data from lists that span more than one page. The method works well to collect data from typical Master and Details pages (e. g. a search listing where you need to click on the result link, collect some more details, and then go back to the search results).

Here's how it will look like in the end:

extract.loop.screenshot12.png

(source code is given below)

The site in question

We start with a real estate site that lists certain locations

(Note that we are not in any way affiliated to any of the companies that appear in the screenshots)

extract.loop.screenshot1.png

Step 1: Record extraction macro

When recording the extraction, we find that the field that contains the text "Date Posted: ... days" can be used as an anchor that allows for addressing the line of results we are interested in. Playing around the POS value of the according TAG command, we find that "1" gives the first line, "4" the second, "7" the next and so on (the POS value increasing by 3 for each line)

extract.loop.screenshot2.png

Step 2: Minimal script around macro

We translate the macro into a VBS script

Image-extract.loop.screenshot3.png

which extracts the first entry when run

extract.loop.screenshot5.png

Step 3: Looping through all results on one page

Now, we put the macro into a loop which

  1. sets the anchor's POS value by the variable "counter"
  2. increments the counter (i.e. the anchor's POS value) by 3 after each extraction
  3. loops until the extraction throws an error (return value iret < 0)

Additionally, we shorten the !TIMEOUT value so the macro does not wait 60s before returning the error when the end of the list is reached.

extract.loop.screenshot8.png

which then extracts not only the first, but all entries on that page of result

extract.loop.screenshot5.png

extract.loop.screenshot6.png

extract.loop.screenshot7.png

Step 4: Looping through all the pages

Finally we want the script to scrape all pages of result.

On every page, there is a "Next" link:

extract.loop.screenshot9.png

recording this link leads to the following TAG command:

extract.loop.screenshot10.png

This command is to be performed, when the loop reaches the end of the recent page (i.e. the extraction macro fails). So we add the following code at the end of the loop, which

  1. checks whether the end of the list is reached
  2. tries to move to next page
    1. in case there is a next page, the loop starts again (counter is reset, return value is not negative)
    2. in case there is no next page, the TAG fails, and the loop ends

extract.loop.screenshot13.png

And here we are: The script scrapes the first page of result, then moves on to scraping the next one. And thus fetches the data from all items on all the pages.

extract.loop.screenshot11.png

The Script's Source Code

Option Explicit
Dim iim1,iret
Set iim1 = CreateObject("iMacros")
iret = iim1.iimOpen("", False) 'connect to open iMacros browser window 

Dim macro
macro = macro & "VERSION BUILD=6110122 " & vbNewLine
macro = macro & "TAB T=1" & vbNewLine
macro = macro & "TAB CLOSEALLOTHERS" & vbNewLine
macro = macro & "SET !TIMEOUT 6" & vbNewLine
macro = macro & "TAG POS={{counter}} TYPE=TD ATTR=TXT:*Date<SP>Posted:*days" & vbNewLine
macro = macro & "TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=TXT" & vbNewLine
macro = macro & "TAG POS=R1 TYPE=TD ATTR=TXT:*Bath* EXTRACT=TXT" & vbNewLine
macro = macro & "TAG POS=R2 TYPE=TD ATTR=TXT:* EXTRACT=TXT  " &  vbNewLine
macro = macro & "TAG POS=R1 TYPE=TD ATTR=TXT:$* EXTRACT=TXT"

Dim counter
counter = 1 

Do While Not (iret < 0)
   iim1.iimSet "counter", counter
   iret = iim1.iimPlayCode(macro)
   msgbox (iim1.iimGetExtract())
   counter = counter + 3
   If (iret < 0) Then
	'end of list reached -> next page
	iret = iim1.iimPlayCode("TAG POS=1 TYPE=A ATTR=TXT:Next")
	counter = 1
   End If
Loop