The renewal maintenance has officially ended for Progress iMacros effective November 30, 2023.
This Wiki site will also no longer be moderated from the Progress side.
Thank you again for your business and support.
Sincerely, The Progress Team
Data Extraction
Extraction post Version 6.0
From version 6.0 of iMacros extraction is no longer handled by a separate command. It is now specified by an additional parameter to a TAG command. Please see the updated Demo-Extract for some examples of this, including the following:
TAG POS=1 TYPE=SPAN ATTR=CLASS:bdytxt&&TXT:* EXTRACT=HTM
This means that the syntax of the command is now the same as for the TAG command, with the type of extraction specified by the additional EXTRACT parameter.
Extract single elements
(Related example macro: Demo-Extract, Demo-ExtractRelative)
iMacros can extract data from Web sites [iMacros Browser only]. Click on the Extract Data button while in record mode to bring up the extraction wizard that will help you create the correct EXTRACT command:
Note: Internet Explorer 6.0 or better must be installed in order to use the EXTRACT command.
The EXTRACT command and thus the extraction is controlled by three different parameters: the extraction anchor, the position and the type of extraction. The most important parameter is the extraction anchor. It contains information on the HTML code around the information which is to be extracted. You must use * at the end of the extraction anchor. If the HTML code given in the anchor appears more than once on a page the position parameter determines which of the occurences is extracted. The type of extraction determines if the result is plain text, HTML source code, an URL, an element's title or the alternative text of an image.
All extraction results can be accessed inside the macro through the built-in variable !EXTRACT. If this variable contains #EANF# (Extraction Anchor Not Found) the extraction was unsuccessful.
Results of multiple extractions in the same macro are separated by a [EXTRACT] tag in the !EXTRACT variable.
During manual replay of macros including EXTRACT commands in the iMacros Browser the extraction result is displayed in a dialog window by default. This behaviour can be controlled by setting the built-in !EXTRACT_TEST_POPUP variable.
Some Background on HTML and Extraction
HTML is the language in which web sites are coded. The language consists of so-called tags, which determine how elements are formated, displayed and aligned. Each HTML tag consists of two parts, an opening part and a closing part. All text between the opening and closing tags is affected by the directives the HTML tag implies. E.g. the following HTML snippet
This text is <B>bold</B>
will result in
This text is bold
i.e. the B tag is used to format text in bold face. When extracting text with iMacros the following procedure is applied:
- iMacros searches the HTML source of the currently active webpage for an occurence of the extraction anchor
- If the anchor is found all text between the opening HTML tag of the anchor and its equivalent closing tag is extracted
- If the anchor is not found the result is #EANF#
Create Extraction Command
To define an EXTRACT command proceed as follows:
- Open the Extraction Wizard ( "Extract Data" button on the Rec tab of the control panel).
- Note: If the information you want to extract is inside a framed web site you need to click inside the frame that contains the information you want to extract before opening the Extraction Wizard. This generates the FRAME command and marks the frame as active for the extraction.
- In the browser window or frame select the text that you want to extract.
- The marked information will be displayed in the yellowish text area on the left. iMacros also creates a suggestion for the extraction anchor, which is displayed in the orange text field on the right.
- Click "Test EXTRACT Tag" to test run the extraction tag. The result of the generated extraction anchor will then be displayed in the yellow text area on the right hand side of the wizard. If the result is #EANF# (Extraction Anchor Not Found) you will need to alter the extraction anchor in order to successfully extract the data.
- If you are satisfied with the result click "Add this EXTRACT tag" to add the EXTRACT statement to the macro.
Save Extraction Result
There are two methods to retrieve extracted data.
SAVEAS
You can save extracted data directly to a file by adding a "SAVEAS TYPE=EXTRACT" command manually to the macro. All items that were extracted before the SAVEAS command are saved to the specified file in one row like
"item1", "item2", "item 3", ...
As you can see the [EXTRACT] tags are substituted by commas. The SAVEAS command erases the content of the !EXTRACT variable afterwards. With the next start of the macro or the next round of a loop a new line is added to the file.
iimGetLastExtract()
You can also use the iimGetLastExtract() method of the Scripting Interface to access the extracted data in your application. Potential [EXTRACT] tags are included in the returned string and can be used to separate different extraction results.
Unsuccessful Extraction
As said above, if the extraction was unsuccessful, i.e. the extraction anchor could not be found on the page, the !EXTRACT variable holds the string #EANF# (Extraction Anchor Not Found). However, the return value that informs you whether the execution of a macro was successful is still positive. The reason for this behaviour is that a macro can have many EXTRACT commands and often only one or a few of them do not find the extraction anchor. If you want to check if a particular EXTRACT command was successful you just need to check if #EANF# is present in the returned string. Often this can be very useful, for example if you use EXTRACT to check if a keyword is present on a page. A returned string containing #EANF# indicates that the keyword is not found.
Extraction of Dialog Text
To get the text of a dialog use
SET !EXTRACTDIALOG YES
in the macro. Now the content of a dialog is added to the extracted text, i.e. to the !EXTRACT variable.
Extracting From SELECT Elements
In HTML code drop down lists are generated by a SELECT tag. For SELECT boxes the currently active value is extracted.
Select currently active values:
TAG POS=1 TYPE=SELECT ATTR=TXT:*&&NAME:quantity&&VALUE:* EXTRACT=TXT
Extraction and the PRE Tag
Some web pages make use of a <PRE ...> tag in their HTML code. It marks the enclosed text as preformatted -- all the spaces and carriage returns are rendered exactly as you type them. The information enclosed in a <PRE> tag is extracted correctly (including the formatting!) by iMacros. Thus, if you transfer the extracted data via the Scripting Interface all formatting information is retained unchanged. The formatting is only changed on two occasions: line breaks are removed when displaying the result in the test dialog box and when saving the result using the SAVEAS command. This is necessary to ensure proper formatting of the CSV formatted text file because in the CSV format a line break would start a new line.
Trouble Shooting
Sometimes iMacros cannot suggest a proper extraction anchor automatically, in which case you can create one manually. Enter it in the orange text area on the right side of the extraction wizard and test it with the "Test EXTRACT tag" button. Please read all the information in this Chapter to get a good overview of how the EXTRACT command can be tweaked manually.
Extract with relative Positioning
(Related example macro: Demo-ExtractRelative )
When extracting data from a complex websites the extraction can be made easier if you can tell iMacros to start the search for the extraction anchor after a specific point on the page (as opposed to start from the top, which is the default).
E.g., assume you want to extract data from a specific cell in a table, in this case the size of the land in the second table.
Without relative positioning you would have to count the cell from the top of the page including cells from other tables that come before the land table. Although the extraction wizard can do this for you, you run into problems as soon as the number of rows in a table are not constant as they are in the above example. The Transfer table of result 1 has four rows, that of result 2 has five rows. Thus, an absolute position parameter like so
TAG POS=1 TYPE=TD ATTR=CLASS:code&&TXT:* EXTRACT=TXT
will potentially result in the extraction of an unwanted result.
With relative positioning you tell iMacros to search for the extraction anchor located after the position that is indicated by a TAG command immediately before your EXTRACT command. In our case we click on the table title "Land" before starting the extraction wizard to create a TAG command. Note that this TAG command does not click on any link, rather it only marks an element to indicate a position for the following EXTRACT command. Relative positions are indicated with an R before the position number.
TAG POS=1 TYPE=B ATTR=TXT:Land TAG POS=R1 TYPE=TXT ATTR=CLASS:code&&TXT:* EXTRACT=TXT
Save extracted data
(Related example macros: Demo-Extract, Demo-Extract-Table)
There are two methods to retrieve extracted data.
SAVEAS (PRO and SCRIPTING Edition)
You can save extracted data directly to a file by adding a "SAVEAS TYPE=EXTRACT" command manually to the macro. All items that were extracted before the SAVEAS command are saved to the specified file in one row like
"item1", "item2", "item 3", ...
As you can see the [EXTRACT] tags, which are inserted to distinguish results from different EXTRACT commands, are substituted by commas. The SAVEAS command erases the content of the !EXTRACT variable afterwards. With the next start of the macro or the next round of a loop, a new line is added to the file.
iimGetLastExtract() (SCRIPTING Edition)
You can also use the iimGetLastExtract() method of the Scripting Interface to access the extracted data in your application. Potential [EXTRACT] tags are included in the returned string and can be used to separate different extraction results - see the included extract-2-database.vbs.
Extract & Scripting Interface
(Related example scripts: Extract-and-fill.vbs, Extract-2-file.vbs, Get-Exchange-Rate.vbs)
All extracted data can be sent to your code via the Scripting Interface. This gives you all the power of any programming language you choose, to process the extracted information further or simply save it to a file.
Use the "iimGetLastExtract()" command to return the extracted text if you used any EXTRACT commands within the macro.
The extracted text is returned as a string. Extracted information resulting from different EXTRACT commands are separated by [EXTRACT], e.g.
Text to be extracted[EXTRACT] Salary: 33,000.00 per year[EXTRACT]...
Remember: Using the "SAVEAS TYPE=EXTRACT" command will reset the contents of the !EXTRACT variable. Thus, using this command in a macro whose extraction result you wish to obtain via the Scripting Interface will result in an empty string in your application!
If you extract a complete table, the data from different columns is separated by #NEXT# and each table row ends with #NEWLINE#. You can easily use the separation tags to split the complete dataset. In Visual Basic Script, this would for example look something like
s = Replace(s, "#NEWLINE#", """" + vbCrLf + """") s = Replace(s, "#NEXT#", """"+ "," + """")
Example 1 - Split the returned string
The returned string is split to separate the results from different EXTRACT commands.
Dim data as String Dim s as String Dim ExchangeRate iplay = iim1.iimPlay("wsh-extract") If iplay = 1 Then data = iim1.iimGetLastExtract() ExchangeRate= Split(data, "[EXTRACT]") s = "One US$ costs " + ExchangeRate(0) + " EURO or " + ExchangeRate(1) + " British Pounds (GBP)" MsgBox s End If
Example 2 - Keyword search
We want to find out if the word "iopus" exists on a web page. If yes, then print the page. To make this example work, create the following macro and save it under the filename "mysearch.iim" in your Macros directory:
VERSION BUILD=3301125 'The keyword *is* the data extraction anchor! EXTRACT POS=1 TYPE=TXT ATTR=*iopus*
To print the web page create the following macro and save it under the filename "print_this.iim" in your Macros directory:
VERSION BUILD=3301125 PRINT
Use the following Windows Script to control the macros:
set iim1= CreateObject ("InternetMacros.iim") iret = iim1.iimInit() iplay = iim1.iimPlay("mysearch") extracted_text = iim1.iimGetLastExtract() 'test if keyword appeared on website. If iplay = 1 Then if instr (extracted_text, "#EANF#") > 0 then MsgBox ("Sorry, keyword not found") else iplay = iim1.iimPlay("print_this") End If End if If iplay < 0 Then MsgBox "Error!" End If
Note: You can also write directly to any Windows database. Please see the "extract-2-database.vbs" script for some example code. The script writes all results directly to a Microsoft Access database.
More Examples
iMacros comes with several example scripts that demonstrate the EXTRACT command:
- extract-2-file.vbs
- extract-and-fill.vbs
- get-exchange-rate.vbs
The scripts are found in the "Examples\Windows Scripting Host" directory of your iMacros installation. More example scripts and test pages are available at http://www.iOpus.com/iim/demo
Extract Tech Tip
Question: EXTRACT works while I am testing in the Extraction Wizard, but when I run the macro Extract only returns #EANF# (Extraction Anchor not found).
Answer: Some websites are created dynamically from databases and the exact content of the website changes every time you visit a page. The solution is to replace the changing part of a link or extraction with the wildcard symbol.
Example: Assume you searched for a product on a retailers site and the resulting page is a table of products, each with its own description and price tag, which are enclosed by the A HTML tag, like so:
<TR> <TD> <A class=price href="/homes/homesforsale/view_details.jsp?advertID=14470882&listID=2492&index=1&"> Product 1, Price 1 </A> </TD> </TR> <TR> <TD> <A class=price href="/homes/homesforsale/view_details.jsp?advertID=14470882&listID=2492&index=1&"> Product 2, Price 2 </A> </TD> </TR> <TR> <TD> <A class=price href="/homes/homesforsale/view_details.jsp?advertID=14470882&listID=2492&index=1&"> Product 3, Price 3 </A> </TD> </TR> [...]
Here is an extraction anchor as suggested by the Wizard for the first product:
EXTRACT POS=1 TYPE=TXT ATTR=<A<SP>class=price<SP>href="/homes/homesforsale/view_details.jsp?advertID=14470882&listID=2492&index=1&">*
This command works fine in the Wizard, but fails during the macro execution. Why? Because the "listID" part of the URL changes every time you visit the page. You can find this out by running the Wizard twice (after refreshing the page in between) and comparing the extraction anchors. We also note that the variable "advertID" is probably the most important part of the link, since it defines the ad.
Solution: Replace the changing listID number with *:
EXTRACT POS=1 TYPE=TXT ATTR=<A<SP>class=price<SP>href="/homes/homesforsale/view_details.jsp?advertID=14470882&listID=*&index=1&>*
Actually, while you are at it you can remove most static parts of the anchor as well. The result looks like:
EXTRACT POS=1 TYPE=HREF ATTR=<A<SP>class=price<SP>href="*advertID=14470882*">*
If you want to cycle through all the ads on the page you can do this as follows:
- Replace the advertID number by an asterisk. Now, it will always find the matching extraction anchor.
- To tell iMacros go for the second (third,....) product, change the POS parameter with a variable:
EXTRACT POS={{!LOOP}} TYPE=HREF ATTR=<A<SP>class=price<SP>href="*advertID=*">*
During runtime {{!LOOP}} takes on the values 1, 2, 3,... iMacros extracts the price on this page consecutively.
Asian Language Support
iMacros runs on all language version of Windows, including the so-called "double-byte" languages like Chinese, Japanese or Korean.
Data Extraction Tip:
Western (ANSI) characters can be extracted on any language version of Windows. In order to extract Asian characters correctly, please run iMacros on a Windows system that supports the language. Example: To extract Chinese characters please run iMacros on the Chinese language version of Windows.