Cry about...
Delphi Programming with TWebBrowser
How to get the HTML displayed in a TWebBrowser
There are three ways to get the HTML displayed in a web browser:
- Obtain the HTML from the WebBrowser
DOM
- Obtain the HTML from the WebBrowser
- Obtain the HTML from the browser
cache
There are advantages and disadvantages of each.
To retrieve the HTML directly from the WebBrowser's
DOM:
function GetHtml(var webBrowser as TWebBrowser): String;
var document as IHTMLDocument2;
begin
document := webBrowser.Document as IHTMLDocument2;
result := document.body.innerHTML;
end;
This is simple and works well. The only (and main) problem with it is
that it is returning the HTML that the web-browser has displayed and this
is not necessarily the same as the original HTML. For example, if the original
HTML file included:
<script type="text/javascript">
document.write('Hello');
</script>
then the HTML returned by the above function will contain the "Hello"
but not the "<script ...". It also does not include any header information
(such as keywords and the title).
The following function will extract the HTML from a WebBrowser, including
the header block as well as the body of the HTML:
function GetBrowserHtml(const webBrowser: TWebBrowser): String;
var
strStream: TStringStream;
adapter: IStream;
browserStream: IPersistStreamInit;
begin
strStream := TStringStream.Create('');
try
browserStream := webBrowser.Document as IPersistStreamInit;
adapter := TStreamAdapter.Create(strStream,soReference);
browserStream.Save(adapter,true);
result := strStream.DataString;
finally
end;
strStream.Free();
end;
The following example shows how to retrieve the HTML from the browser
cache:
var
h_cachedInternet: HINTERNET;
function GetRawHtml(var web_browser: TWebBrowser): String;
var
http_handle: HINTERNET;
buffer: array [0..20] of Char;
url: String;
bytes_read: DWORD;
begin
url := web_browser.LocationURL;
http_handle := InternetOpenUrl(h_cachedInternet,
PChar(url),nil,0,INTERNET_FLAG_NO_UI,0);
if http_handle = nil then
result := ''
else
begin
//--------------------------------------------------------------
// Retrieve the URL data. Hopefully this should be straight from
// the cache because of how the internet connection was defined.
//--------------------------------------------------------------
result := '';
repeat
InternetReadFile(http_handle,@buffer,Length(buffer),bytes_read);
result := result + Copy(buffer,1,bytes_read);
until bytes_read =0;
InternetCloseHandle(http_handle);
end;
end;
initialization
//--------------------
// Initialise WinInet.
//--------------------
h_cachedInternet := InternetOpen(PChar(application.title),
INTERNET_OPEN_TYPE_PRECONFIG_WITH_NO_AUTOPROXY,nil,nil,
INTERNET_FLAG_FROM_CACHE);
This has the advantage that it does not require an instance of TWebBrowser,
so will be more suited to some applications.
Note:
- It is using WinInet functions and only
uses the browser to obtain the URL.
- It is reading the file directly from the WinInet file cache - it
is therefore assumed that the file in the cache will be the same as
that used by the TWebBrowser. The assumption is reasonable most of the
time, but it is possible that the file may have been flushed from the
cache, not cached or replaced by a different copy by another Web Browser.
See also: How to navigate a frameset.
These notes are believed to be correct for Delphi 6, but
may apply to other versions as well.
|