[Date Prev][Date Next][Thread Prev][Thread Next][Interchange by date
][Interchange by thread
]
[ic] Froogle.google.com anyone using this yet?
cfm@maine.com wrote:
>
> The problem you have is the HTML in the database. That makes
> it really hard to reuse. You might want to consider ways of
> getting HTML out of your raw data.
>
A quick test script for you:
----------------------------------------------------------------------
use HTML::TreeBuilder;
use HTML::FormatText;
use strict;
my $text =<<'EOB';
<body>
<p>
This is a test blah blah.
<a href="foobar.html">What's this, a link?</a>.
</p>
<p>
Let's have some text in <font color="#FF0000">red</font>.
</p>
<p>
Some "entities" will make another test case.
</p>
</body>
EOB
my $tree = new HTML::TreeBuilder;
$tree->parse($text);
my $formatter = new HTML::FormatText(
leftmargin => 4,
rightmargin => 74,
);
$text = $formatter->format($tree);
print $text;
----------------------------------------------------------------------
The output is:
This is a test blah blah. What's this, a link?.
Let's have some text in red.
Some "entities" will make another test case.
--
_/ _/ _/_/_/_/ _/ _/ _/_/_/ _/ _/
_/_/_/ _/_/ _/ _/ _/ _/_/ _/ K e v i n W a l s h
_/ _/ _/ _/ _/ _/ _/ _/_/ kevin@cursor.biz
_/ _/ _/_/_/_/ _/ _/_/_/ _/ _/