VBS looping

From iMacros

Jump to: navigation, search

Contents

Looping through a table that spans over several pages (using VBS script)

This tutorial describes how to extract data from lists that span more than one site.

Here's how it will look like in the end:

(source code is given below)

The site in question

We start with a real estate site that lists certain locations

(Note that we are not in any way affiliated to any of the companies that appear in the screenshots)

Step 1: Record extraction macro

When recording the extraction, we find that the field that contains the text "Date Posted: ... days" can be used as an anchor that allows for addressing the line of results we are interested in. Playing around the POS value of the according TAG command, we find that "1" gives the first line, "4" the second, "7" the next and so on (the POS value increasing by 3 for each line)

Step 2: Minimal script around macro

We translate the macro into a VBS script

which extracts the first entry when run

Step 3: Looping through all results on one page

Now, we put the macro into a loop which

  1. sets the anchor's POS value by the variable "counter"
  2. increments the counter (i.e. the anchor's POS value) by 3 after each extraction
  3. loops until the extraction throws an error (return value iret < 0)

Additionally, we shorten the !TIMEOUT value so the macro does not wait 60s before returning the error when the end of the list is reached.

which then extracts not only the first, but all entries on that page of result

Step 4: Looping through all the pages

Finally we want the script to scrape all pages of result.

On every page, there is a "Next" link:

recording this link leads to the following TAG command:

This command is to be performed, when the loop reaches the end of the recent page (i.e. the extraction macro fails). So we add the following code at the end of the loop, which

  1. checks whether the end of the list is reached
  2. tries to move to next page
    1. in case there is a next page, the loop starts again (counter is reset, return value is not negative)
    2. in case there is no next page, the TAG fails, and the loop ends

And here we are: The script scrapes the first page of result, then moves on to scraping the next one. And thus fetches the data from all items on all the pages.

The Script's Source Code

Option Explicit
Dim iim1,iret
Set iim1 = CreateObject("imacros")
iret = iim1.iimInit("",FALSE) 'connect to open iMacros browser window 

Dim macro

Dim counter
counter = 1 

Dim extraction, extractionArray(5)

do while not (iret < 0)
	macro = "CODE:"
	macro = macro + "VERSION BUILD=6110122     "+vbNewLine
	macro = macro + "TAB T=1      "    + vbNewLine
	macro = macro + "TAB CLOSEALLOTHERS     "+vbNewLine
	macro = macro + "SET !TIMEOUT 6      " + vbNewLine
	macro = macro + "TAG POS="+Cstr(counter)+" TYPE=TD ATTR=TXT:*Date<SP>Posted:*days   "+vbNewLine
	macro = macro + "TAG POS=R1 TYPE=TD ATTR=TXT:*   EXTRACT=TXT"+vbNewLine
	macro = macro + "TAG POS=R1 TYPE=TD ATTR=TXT:*Bath*   EXTRACT=TXT" + vbNewLine
	macro = macro + "TAG POS=R2 TYPE=TD ATTR=TXT:*   EXTRACT=TXT  " + vbNewLine
	macro = macro + "TAG POS=R1 TYPE=TD ATTR=TXT:$*   EXTRACT=TXT"
	iret = iim1.iimPlay(macro)
	msgbox (iim1.iimGetLastExtract())
	counter = counter + 3
	if (iret < 0) then
		'end of list reached -> next page
		macro = "CODE:"
		macro = macro + "TAG POS=1 TYPE=A ATTR=TXT:Next  "
		iret = iim1.iimPlay(macro)
		counter = 1
	end if
loop

iret = iim1.iimPlay(macro)