[libhtml-parser-perl] annotated tag upstream/3.72 created (now acbad4a)
Lucas Kanashiro
kanashiro.duarte at gmail.com
Thu Jan 21 01:27:31 UTC 2016
This is an automated email from the git hooks/post-receive script.
kanashiro-guest pushed a change to annotated tag upstream/3.72
in repository libhtml-parser-perl.
at acbad4a (tag)
tagging 295ddd70d43196c874abf14ac23f55e7c406b85a (commit)
replaces upstream/3.71
tagged by Lucas Kanashiro
on Wed Jan 20 23:04:34 2016 -0200
- Log -----------------------------------------------------------------
Upstream version 3.72
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWoC6iAAoJEPgjonKYg8l84WsP/izljcR7dp/5J6O+1aiXvtPL
5vcq/EEO2cYlNzkswfO9bQcR/iC/TCyv0GkIWnuCWvA0ZKlGCcNGT5o2QoY8nr6E
xG/KX1da3crSQ7q7jkHONU8qNcngxKfgxY0xxzJMe1zXmrrX3mfQZ18LfGjhkY1m
E6I44e0o52VpakOENKtY7ykvzAdCl6NOvDcnYi2PyZT2eVgfTKcqZjcmQBwtw0rw
QQCParbg5kA096mgUK0DUccP90ypTGHk6NsVyAcrDx4bezO3CJubXa2f3AYc8PZY
DCJXcfBBA4Pz3m/qDZQ9NdJ0J1e6jkWdJM9ATMQf2sIZECD8RzVdk/QkRi6hDHMX
SDEul3GrzOoRRWC7bDKDEZpjcCo/kJG6DBbemlq18CLx1NnzKD78q+zdNVpD5v1y
vigc+zj72qRIU2oA5iuczzLBW1bVMNGHKrSau+Gxcp3D1RcZVU69RIYIhUdGccYj
IC1TXyqZ79yV7dVpS6802eTugcTF2oH7wzIacCDwVhK813csK6JliXq8kGP79sVT
non8lyL0ACuJjyedtGvDolRJsZxk9lL1I9/yPZCvpwpNKg/7JraIsrEutFrNkAhx
imu1gEmdgy426+9+rnl8KAcU5L4c20k79p3i494VMhtsrZcK1tA3kd4DeadKEyX1
IKKMOOKWtKmJV0ZRDmlG
=QuOq
-----END PGP SIGNATURE-----
Antonio Radici (1):
Reference HTML::LinkExttor [RT#43164]
Barbie (1):
fix to TokeParser to correctly handle option configuration
Chip Salzenberg (1):
Avoid crash (referenced pend_text instead of skipped_text)
Damyan Ivanov (1):
Short description of the htextsub example
David Steinbrunner (2):
typo fix
typo fixes
François Perrad (1):
Fix for cross-compiling with Buildroot
Gisle Aas (55):
Start using GIT to track the sources.
Patch by CHORNY that provide compatibility with older perls.
Recognize the </script> and </style> end tags even if quoted.
Parse the <iframe> content in literal/CDATA mode.
Release 3.57
Recognize the Unicode BOM in utf8_mode as well [RT#27522]
Avoid ending up with '/' keys attribute in Link headers.
Suppress "Parsing of undecoded UTF-8 will give garbage" warning with attr_encoded [RT#29089]
Don't hardcode source line numbers [RT#38114]
Release 3.58
Restore perl-5.6 compatiblity for HTML::HeadParser
Tell git to ignore the dist tarballs
Update for GIT and other tweaks.
More meta info
Release 3.59
Release 3.60.
Test that triggers the crash that Chip fixed
Complete documented list of literal tags
Release 3.61
Avoid "my" variable $p masks earlier declaration warning from test
Doc patch: Make it clearer what the return value from ->parse is
Update TODO list
Release 3.62
Take more care to prepare the char range for encode_entities [RT#50170]
decode_entities confused by trailing incomplete entity
Release 3.63
Convert files to UTF-8
Don't allow decode_entities() to generate illegal Unicode chars
Copyright 2009
Remove rendundant (repeated) test
Make parse_file() method use 3-arg open [RT#49434]
Release 3.64
Eliminate buggy entities_decode_old
Release 3.65
Fix entity decoding in utf8_mode for the title header
Release 3.66
chmod +x [RT#58016]
Release 3.67
Declare the encoding of the POD to be utf8
Release 3.68
Documentation fix; encode_utf8 mixup [RT#71151]
Make it clearer that there are 2 (actually 3) options for handing "UTF-8 garbage"
Github is the official repo
Can't be bothered to try to fix the failures that occur on perl-5.6
Release 3.69
Comment typo fix
Release 3.70
Transform ':' in headers to '-' [RT#80524]
Release 3.71
Merge branch 'master' of github.com:gisle/html-parser
Avoid more clang casting warnings
Remove trailing whitespace
Ensure entities expand to utf8 sequences under 'utf8_mode' [RT#99755]
Copyright 2016
Release 3.72
Jacques Germishuys (1):
Silence clang warning
Jon Jensen (1):
Aesthetic change: remove extra ;
Lucas Kanashiro (1):
Imported Upstream version 3.72
Mike South (1):
Suppress warning when encode_entities is called with undef [RT#27567]
Nicholas Clark (1):
bleadperl 2154eca7 breaks HTML::Parser 3.66 [RT#60368]
Salvatore Bonaccorso (1):
Fixed endianness typo [RT#50811]
Ville Skyttä (12):
Spelling fixes.
Test multi-value headers.
Documentation improvements.
Do not terminate head parsing on the <object> element (added in HTML 4.0).
Add support for HTML 5 <meta charset> and new HEAD elements.
HTTP::Header doc typo fix.
Do not bother tracking style or script, they're ignored.
Bring HTML 5 head elements up to date with WD-html5-20090423.
Improve HeadParser performance.
Documentation fixes.
Trim surrounding whitespace from extracted URLs.
Merge pull request #6 from dsteinbrunner/patch-1
Yves Orton (1):
Fix Issue #3 / RT #84144: HTML::Entities::decode_entities() needs to call SV_CHECK_THINKFIRST() before checking READONLY flag
Zefram (1):
HTML::Parser doesn't compile with perl 5.8.0.
aas (110):
First revsion.
Fake-compile regexps using anonymous subs. More documentation.
Removed trailing whitespace and unexpanded the text (replaced initial space with
Fixed copyright message.
Moved from ../base
Avoid quotes in hash key.
First revision.
Added test based on RFC1866
Included additional ISO-8859/1 entities listed in rfc1866 (section 14).
Typo fix by Bob Dalgleish <Bob_Dalgleish at Develcon.com>
First version. Posted on the mailing list 1996-07-08.
Clear links when calling parse_file().
Parse <link> attributes in head. Renamed header Base: to Content-Base:
Slightly better documentation.
Renamed Base: to Content-Base: and Implemented Link:
First revision.
Got Ambiguous use of {links} resolved to {"links"}
Added support for <embed src="..."> as suggested by Hans de Graaff
Added <frame src="..."> to the things recognized
Added an example to the documentation.
Added test to check that the links method work when there are no links in the parsed document.
Avoid 'Can't use an undefined value as an ARRAY reference message' when no links are found in the document.
Must escape literal $ in regular expression.
$p->eof instead of $p->parse(undef)
Support netscape_buggy_comment() and implement the eof() method.
Added two new start() parameters; $attrseq and $origtext.
First revision.
Allow "_" in attribute names since Netscape really use this in their bookmarks.html
Initialize from all <meta> as X-Meta-Foo
Parser was very confused about "</" when it did not start an end tag.
$p->links now truncates the list.
Added SYNOPSIS to all libraries since perl5.003_97 warns if it
Updated the documentation.
Only modify arguments in void context. Requires 5.004
Doc bug spotted by Martijn Koster
Know about <applet code=URL>. Patch from Daniel V Klein <dvk at lonewolf.com>
Check for Bill Simpson-Young's problem.
Might introduce ";" for things that look like entities but is not. Reported by Bill Simpson-Young <bill.simpson-young at cmis.csiro.au>
Documentation update.
=head2 replaced by =item
Reformatting by Martijn.
Replaced netscape_buggy_comment() with strict_comment(). Documentation update.
Pass original text to end() method. Patch by Brian McCauley
First revision.
Added documentation.
Fix TableStripper example bug.
First revision.
Optimized by moving lookup of !$self->{'_strict_comment'} out of the
Document how chuck size influence efficiency. Reduce chunk size in
Special case for plain start tags give 2.5% speed up.
Use last instead of return to get of the the while-loop in parse().
Added a BUGS section.
Added $VERSION.
use strict;
Don't call the text() method with zero length text any more.
First revision.
Increment version number.
First revision.
Added Changes.
Added some more real content.
New (more interesting) date.
First revision.
Splitted test based on wheater URI::URL is available or not.
Only make the URI::URL module required if a $base URL is given
Make it work even without HTTP::Headers installed.
Provide our own header object implementation. Does not depend on
First revision.
Make it work better.
New tests.
Documentation flikking.
2.15 changes.
Typo.
Tweaks.
Used to be called parser.t
Replaced with a real test.
Some more HTML.
Broke HEX entities ÿ
The old t/parser.t is now t/cases.t
Always clean up tmpfile.
Make it release 2.16 instead.
Updated manual page.
Never split words (a sequence of non-space) between two invocations of
2.17
parse_file now use smaller chunks.
Document smaller chunk.
Incremented version number (sub-modules changed).
Make it better subclass-able by calling $self->_found_link each time a
Provide a parse_file method that cares about the return value from
Test $p->parse_file method
Documentation fix.
2.18 changes.
Don't leave space and end of chunk when trying to avoid breaking words.
2.19
First revision.
Added HTML::TokeParser
Much more stuff.
Reference to TokeParser
tokeparser.t
First revision.
Added documentation.
2.20
Added Author address
Updated with new manual page. Mention HTML::TokeParser.
More tests.
Support reading from plain strings and from globs.
Netscape comment patch by Peter Orbaek <poe at daimi.au.dk>.
2.21
Protect eval from $SIG{__DIE__}
2.22
Incremented version number.
bulk88 (1):
const+static-ing
gisle (892):
Removed wrong expired address
Various spell fixes.
Fixed my email address.
Documentation update.
New year.
Incremented version number.
2.23.
From: Clinton Wong <clintdw at netcom.com>
Better recognition of GLOBs in parse_file().
Added t/parsefile.t
First revision.
Test parsing of large inline documents too.
More efficient parsing of large inline documents.
Don't die just because the filename passed to $p->parse_file() can't
Document that the scalar passed to the constructor must stay the same
Get rid of the file in the end.
Documentation update.
Updated mailing list address. Removed formatted HTML::Parser manpage.
Get rid of $Id$ line again.
Summarized 2.24
Asjustment of parse_file() change description.
First revision.
-Wall
End tags are recognized.
Recongnize processing instructions.
Beginning of declaration and comment matching.
Parse declarations.
Parse start tags too.
Push PL_sv_yes
More testing.
Free memory assosiated with tokens arrays for premature and error parsing.
Bye.
Updated.
First revision.
Makefile
Set DISTNAME.
First revision.
Added some real XS glue.
Small adjustments.
Real callbacks for text and end tags.
Added copyright notice.
Added rest of callbacks.
Set up method callbacks.
strict_comment(). A few small tweaks.
Callbacks now get a reference to the parser object as 1st argument.
Keep white space together.
Make test compatible with HTML::Parser 3 which have its own DESTROY method.
New parse_file() implementation to keep in sync with HTML::Parser's
Some tweaks here and there.
Attribute keys are now already lowercased
Reduction.
pass_cbdata()
pass_cbdata boolean
Added typemap.
First revision.
Added README
Also set up processing instructions.
Incremented version number.
Implemented strict comments.
Implemented keep_case option.
Added accum attribute.
Fill accum array as various tokens are found.
Incremented version number again.
Allow ':' in identifiers (isHALNUM).
Allow ":" in attribute names because it is used by Microsoft.
Version 2.25
Don't print filtered any more.
Check for $self->{parse_file_stop}
Avoid parse_file() duplication.
Summarized 2.25 changes.
Minor detail.
First revision.
Look for $self->{parse_file_stop} in $self->parse_file loop.
Added lib files and t files.
<XMP>...</XMP> support.
<xmp> support.
Increased version number again.
Replaced <xmp> support with the more general literal_mode.
Added TODO list.
We did not get out of literal mode as we should.
Another todo item.
More todo things.
Another break.
Killed some unneeded conditionals.
2.99_04 release.
2.99_05
New release again.
Blush!
Incremented version number.
Implemented xml_mode.
Implemented bool_attr_value
If no bool_attr_val is set, then it will take the value of the attribute.
First revision.
Added Solaris hints to avoid gcc compilator bug.
Inline decode_entities function.
Updated todo.
Load HTML::Entities.
2.99_06 release.
Rely on XS implementation of decode_entities_old.
Integrated HTML-Parser-XS version 2.99_06.
Version 2.99_07
Attribute values entities are now expanded in the start callback.
New bool attribute: decode_text_entities.
Call the bool_parser_attr() function strict_comment() in order to avoid
Got back old README text.
Updated bug section.
We got problems with ERROR. Trying with FAIL instead:
Tweaks to make it compile with perl5.004_04 too.
Avoid calling SvREFCNT_inc() in void context (mostly).
Make a copy of assigned 'bool_attr_val'.
Fix serious memory leak. We allocated an SV for text content twice.
In xml_mode, don't report empty start tags with an extra parameter,
Added line number counting as an option.
Summarized _07 changes.
Make it compile on perl5.004_05.
Need to push references to PVAVs onto the accum array.
More newRV-fixing when pushing array elements into an array.
Implemented v2_compat flag.
Reply on $p->v2_compat to set up method callbacks.
Implemented by taking advantage of $p->accum.
Also filter process instructions.
Moved to URI.pm
Set up start-callback function instead of relying on method callbacks.
Passing callbacks in ctor did not work (Need to try to set callbacks
Close file to make sure it is not empty..
Warn if unlink($filename) fails.
Close filehandle before trying to unlink it.
close files.
Better unlink warning.
Don't catch exceptions when trying to call ctor key arguments as a
Moved comment parsing out of html_parse_decl into its own procedure.
Added a process instruction to the stuff.
Rely on the complete process instructions to be available is second
Implemented 'default' handler. All document text is passed to this
Summarized 2.99_08
Grammar fixes by Michael A. Chase <mchase at ix.netcom.com>
Added binmode() to test since it was done to the $p->parse_file method
Incremented version number to 2.99_09
From: "John Hurst" <jbhurst at ibm.net>
close($io) as workaround for perl-close bug.
Some minor cleanup.
All specific parsing now delegated to parse functions. Simplifies
Select parse function by an array lookup instead of a series of if-tests.
First revision.
Set up dependecy for pfunc.h
Added mkpfunc.
Use type 'bool' for boolean attributes in PSTATE
Added mkhctype.
#include "hctype.h"
First revision.
Build "hctype.h"
Use hctype-macros to implement strict names.
Prepare for 2.99_09
Avoid \z which did not do the right thing for perl5.004
Avoid \z which don't work for perl5.004
Better alpha release summary
2.99_10.
Summarized 2.99_10
The old POD is back.
Added documentation note.
Parse <!> as an empty comment. Hooks for marked_section implementation.
Incomplete marked section support.
Markde CDATA/RCDATA sections now work.
Make marked section support deselectable.
Don't leak any $@ messages.
Be case insensitive when matching the end tag in literal_mode.
2.99_11.
Added even more link tags as suggested by
Complete marked section support.
Put magic number into the header of p_state.
Ask if marked sections should be there.
Implemented unbroken_text option.
Implemented attr_pos().
Gramar changes from Michael A. Chase.
Gramar fixes by Michael A. Chase.
Text change.
Make attr_pos "work" for boolean attributes too.
Report end of previous attribute/tag as first number for attr_pos
Callbacks are now set up with _cb suffix.
For the constructor arguments, we now use _cb as suffix for those that
pass_cbdata renamed to pass_self.
pass_cbdata renamed as pass_self
Expanded TODO section.
One more optimization to think about.
Summarized 2.99_12.
2.99_13
Gramar corrections by Michael A. Chase
Case insensitive yes.
Documentation patch from Michael.
Various documentation updates.
More updates to documentation.
First revision.
First revision.
Test accum filling.
Added two new tests.
Make it possible to unset callbacks.
First revision.
HCTYPE_NOT_SPACE_EQ_SLASH_GT 0x40 was not initialized.
First revision.
Two more tests.
Summarize 2.99_13.
From: "Michael A. Chase" <mchase at ix.netcom.com>
Some more todo.
In perl5.004_05 we can't return PL_sv_undef safely.
Forgot a little detail.
Fixes by Michael A. Chase
Documentation update by Michael A. Chase.
One more todo option.
Incremented version number.
Prepare for 2.99_14.
Better warning if undefined document is passed in.
First revision.
First revision.
Renamed as tokenpos.h
Added another .h file. Made marked section support the default.
First take at normalizing everything to call html_handle(). We still
Now also html_parse_start() calls html_handle().
Version 2.99_15
Added handler stuct array to pstate. Replaced $p->callback and
Basically set up callback loop.
Set up all basic arguments.
Trimmed out various boolean attributes. The ones eliminated are:
Implemented cdata argspec.
Updated TODO list.
Killed all the routines that was replaced by html_handle().
attrspec_compile()
Direct method calls.
Added MAC to copyright notice.
New callback interface.
token1 indentifier in attrspec
Allow handler to be specified as an array of two values too.
Look for MS_IGNORE in html_handle().
New syntax.
Move to new syntax.
Better default handlers.
Took out accum test.
Fit with new way of doing things.
Avoid reporting empty text segments.
Set up our own accumulator array.
Changed sequence of handler arguments.
Reversed order of $p->handler arguments.
Added tokenpos.h
2.99_15
We did copy from the wrong place.
First revision.
Added largetags.
Killed unused $a
Support "event" in argspec.
2.99_16.
2.99_16
Test with ">" after ms.
Documentation update from MAC
MAC patch to support accumulator array in html_handle().
version => 3 ctor option.
Artificial end tag should have empty origtext.
Test that artificial end tag get empty origtext.
api_version.
api_version => 3
api_version => 3.
Don't ask about marked sections any more.
Don't eat newline after "]]>"
Fix some obvious memory leaks.
]]> dont swallow "\n" any more.
2.99_17
"realloc" as parameter name created problems. Fix by Paul Schinder <schinder at pobox.com>
Patch from MAC that makes it into a real test.
Documentation patch from MAC.
Working array dest.
Use internal array-as-handler-destination-support. Patch by MAC.
Since we are faster we need longer speed test.
Moved some functions out of Parser.xs
Prettifying.
Added copyright
Dropped html_ prefix.
Update.
First revision.
Moved stuff out of Parser.xs
More H files.
More stuff.
2.99_90
Some attrspec renaming.
2.99_90
Minor spellfix.
beta now
Does not make sense in XS parser world.
literal_mode_elem
Moved literal_mode_elem to hparser.c
Remove some commented-out code.
Documentation patch from MAC.
Updated it.
Reduce length of speed test.
Initial support for offset.
pending_text gone.
Update.
Added offset.
Document offset.
Working "offset" in attrspec.
First revision.
Added offset.
Updated.
2.99_91
First revision.
New case.
Added t/attrspec.t
Doc patch from MAC.
One more.
Typo fix by MAC.
Fix tokens reported in the artificial case. Patch by MAC.
<a "> core dump.
First revision.
Back out some more changes.
Take out linepos
For boolean attributes would could get very strange values unless
Bug tokens for artificial tag fixed by MAC.
Update.
Language fixes by Michael.
Documentation update from MAC.
Minor layout fixes by MAC.
Another DOC patch.
Don't make empty token/tokenpos arrays.
Changed behaviour.
Renamed token1 as token0
av_extend() token/tokenpos arrays.
token0
For artificial end tag we don't report any tokenpos, but report tokens.
Update from me.
Rename bool_attr_value
2.99_92
Doc patch from MAC.
Renamed attrspec.t as argspec.t
Renamed attrspec as argspec.
Introduced enum argspec_opcode.
Renamed opcode as argcode and OP_ as ARG_
enum argcode
Nothing much.
First revision.
Renamed bool_attr_value as boolean_attribute_value
Added eg/hrefsub
Added a BUGS section.
Updated.
2.99_93
argspec length
_94
Documented literal string in argspec.
Off by one error when reporting literal end token.
First revison.
shift2
Added htext.
First revision.
Added t/exit-via-next.t
IGNORE.
Argspec undef
First revision.
Added eg/hstrip
Doc patch from MAC.
Typo fixes.
One more attrspec cusin.
Simplified hrefsub by working right to left. Patch by MAC.
Protect " inside $new_v
Better fail message.
Taken out debug stuff.
Renamed cdata_flag as is_cdata
Updated.
Updated.
Added usage string.
Added short description of each file.
Need a statement after a label. Fix pointed out by
Some more thoughts.
MAC improvement (remove stuff from left)
A generic bug. Don't test for it any more.
t/exit-via-next.t gone
if we killed all attributed, kill any extra whitespace too
Some adjustments by MAC.
Fix core dump.
Simplified check_handler()
First revision.
Don't get double refcnt decrement if argspec_compile() or
Remove debugging output.
Allow h->argspec to be NULL in report_event()
Don't allow handler arguments to be grouped as an array reference.
First revision.
Added two more tests.
Yet another update.
Statement that is not correct any more.
Documentation update.
$self->{parse_file_stop}
Documented return value from $p->handler().
2.99_94
Doc patch from MAC.
Added <�� as test case.
A little more precision.
First revision.
Added a comment.
Fix core dump reported by Doug MacEachern.
First revision.
Test netscape_buggy_comment too.
Test process too.
carp about netscape_buggy_comment instead of a warning.
First revision.
Note about depreciate state of this module.
Updated.
Updated again.
2.99_95
Another update.
_hparser.
Changed name of hash entry to _hparser_xs_state.
Two more sections.
First revision.
Make \\ reserved in argspec literals so we can use it as escape character later.
More to go.
One more change.
Allow handlers to call $p->eof to abort parsing.
$p->eof in handlers is now supported.
Updates to the examples.
Handler $p->eof
First revision.
Added many new tests.
Added header.
Various documentation and english tweaks from MAC.
Don't use a Perl-hash for argspec any more. Instead we simply use a
I also decided to take a swing at the IGNORE handler. Any false value
Summarized 2.99_96
Minor tweak.
Yet another one of those useless tweaks.
Simplified.
Test patch from Michael:
Final POD tweaks from Michael.
3.00 and some minor doc tweaks.
Added MAC to Copyright messages
Avoid calling method callbacks as options.
Killed DISTNAME
Make '3.00' a string.
Removed beta blurb.
Added ANNOUNCEMENT
First revision.
After ispell
Use "" instead of &ignore. Patch by MAC.
One additional paragraph from MAC.
After MAC hacking.
3.00
3.00 ready.
Assertion was backwards.
The hash function has probably changed so we need sorting to ensure
Use ~-magic to trigger deallocation when IV that points to struct p_state goes away.
3.01
Summarized new stuff.
Tweaks before 3.01
Added an "also"
Make _hparser_xs_state into a reference to the IV-pointer
Adjusted because _hparser_xs_state is now a reference to the IV-pointer.
Introduced init().
Reuse earlier 'Not a reference to a hash'-message.
3.02
Rephrasing.
First revision.
Added comment parsing.
2000 copyright.
Version 3.03 (new year)
Prepare for 3.03
We did not get out of comment mode for comments ending with an
Try 3 dashes in a row.
Fixed marked_sections without an s
Back out option checking patch by MAC.
Kill documentation of init().
Minor doc tweaks by me.
Backed out some of 3.03 patch.
One more thing.
Some typos fixed.
xml_mode should prevent special treatment of <script>, <style>...
Fix example. Some more text.
Don't enter CDATA mode for some tags in XML mode.
Don't enter literal_mode when XML mode is enabled
No Literal mode for XML.
Special CDATA parsing for XML is gone now.
Moved HTML::Filter to Decpreciated section.
Implemented unbroken_text.
Did not set is_cdata when we got out of outer level CDATA MS.
Get the offset correct when alternating between CDATA/!CDATA modes.
Don't initialize handler before we have to. I am still wondering
First revision.
Also try <xmp>...</xmp>
Don't keep text unbroken between unreported tags.
An extra newline...
New test.
Fix last test.
unbroken text done
3.05 soon ready.
require 3.00
From: James Walden <jamesw at ichips.intel.com>
First revision.
First revision.
Fixed warning.
Avoid some "statement not reached" from picky compilers.
From: Doug MacEachern <dougm at pobox.com>
Version number is now 3.06
3.06.
Added eg/htextsub
Typo.
Fix for 5.004. By avoiding OUTPUT: RETVAL we don't get sv_2mortal()
Incremented version number.
Copyright 2000.
Only continue with declaration parsing when we find "DOCTYPE" or "ENTITY". Based on patch by la mouton <kero at 3sheep.com>.
First revision.
Added t/declaration.t
3.07.
First revision.
A short comment.
Added hanchor.
Typo fix.
Fixed typo spotted by Jamie McCarthy <jamie at mccarthy.org>.
Match typo fix in Parser.pm.
Avoid access to freed() memory.
Version number is now 3.08
Changes for 3.08
ActiveState.com
Document that the $p->parse() argument should not be modified.
Added a litle description of what 'token0' is for process and comment
Documentation update as suggested by Paul Makepeace <Paul.Makepeace at realprogrammers.com>.
3.09
Make a mortal copy of the self argument passed to a handler.
Another change in 3.09
More mortal copies. SPAGAIN after flush_pending_text()
3.10
Typo.
Get %linkElements from HTML::Tagset.
Grab link data from HTML::Tagset
3.11
Rely on HTML::Tagset
Spelling patch from David Dyck <dcd at tc.fluke.com>
PREREQ_PM HTML::Tagset.
3.12
3.12.
Get it to compile with "Optimierender Microsoft (R) 32-Bit
A change missing in the log.
Set up UNICODE_ENTITIES.
Deal with unicode entities.
Copyright 2000
Added unicode entities from HTML4.0.1 spec.
Deal with numification.
Added uentities.
Only 9 tests.
Check for overflow.
Better overflow check.
Test overflow detection.
Avoid failure under unicode.
Don't set UNICODE_ENTITIES if $] > 5.006.
3.13
Prompt for -DUNICODE_ENTITIES
UNICODE_SUPPORT
Don't test if UNICODE_SUPPORT is not enabled.
3.13
Fix infinite loop in case the handler triggered by ->eof
Incremented version number: 3.14
Allow declaration parsing to take place for lowercase <!doctype ...>
Release 3.14
Escape new hash keys that happens to be perl keywords.
$p->get_tag() can now take multiple tag names to match.
Test with multiple arguments to $p->get_tag
Really hide debugging code.
UTF8 entities has already been done.
Require 5.7.0 or better in order to offer "Unicode entities".
Disable GET_CONTEXT for threaded perls because "we want efficiency".
Get out a few more dTHXs by passing context with pTHX_ and aTHX_
Release 3.15.
Document that HTML::Tagset is a PREREQUISITE.
Weaken then libwww-perl PREREQUISITE.
Deleted note about v2 compatibility.
Use INT2PTR instead of cast directly between pointers and IV.
Set up INT2PTR unless perl provide it.
Version 3.16 and Copyright -2001.
A few more ideas.
use strict
unbroken_text now works across ignored tags.
unbroken text behaviour fixed.
Test one more range.
Fix decoding of unicode entities.
Copyright 2001.
Always update size.
Reindent.
Added _decode_entities(). Reindent.
Export _decode_entities()
Added t/entities2.t
Reindent.
3.16
Forgot about pTHX_ from grow_gap().
Release 3.17.
Removed ANNOUNCEMENT.
C++ comment left over from debugging removed.
Release 3.18.
Use get_hv() as documented in perlapi.
Avoid global entity2char. Patch by Sarathy. Version 3.19
Support @attr argspec.
Allow @{....} in argspec to signal flatting of array.
Implemented ignore_tags/ignore_elements/report_tags
Documents filter methods.
Added test for @attr and @{...}
Test new filter methods.
Renamed report_tags as report_only_tags.
Release 3.19_90
Allow array references passed into $p->ignore_tags.
Doc update about the effect on offset/length under unbroken_text
The netscape_buggy_comment now gives mandatory warning
Clear ignoring_element on eof.
Simplify ARG_ATTR code a bit.
Simplify by using ignore_tags/ignore_elements.
No need for end_h
Minor stylistic issue.
Simplify by using report_only_tags
Optimize tag reporting. Image text should not be array ref.
Doc tweak for report_only_tags()
Version 3.19_91
User filters.
Use filters.
Make it possible to pass key/value arguments to the constructor.
Attr needed for textify.
Introduced HTML::PullParser.
Support parsing from doc => $str
Test HTML::PullParser
Reference HTML::PullParser instead of HTML::TokeParser.
A clearer separation between 'doc' and 'file' parsing.
Release 3.19_92
s/report_only_tags/report_only/
Track unicode support as of perl at 9359
Avoid sv_catpvf(sv, "%c",...) as it wants to upgrade
Doc fix.
Release 3.19_93
Support "tag" argspec.
Document "tag" argspec.
Prev patch broke lowercasing of tagnames.
Test "tag" argspec
Example of PullParser usage.
Doc update.
Implemented tracing of line and column numbers.
Column numbers was off by one.
Print line/column numbers instead.
Test col/line.
Get offsets/line- and column- numbers correct when skipping
Release 3.19_94
Include description of HTML::PullParser. Remove description of HTML::Filter.
Ref hform example in doc.
Release 3.20
Don't promise any utf8 option.
Avoid compiler warnings on some some compilers. The DEC C said:
Fix memory leak in filters.
Optimize: Reuse the same SV for filtering by tagnames.
Release 3.21
Decode '
Parse <textarea> in literal mode, but not with is_cdata flag set.
Release 3.22
Moved filter testing code up a bit. The ignore_elements filter
Release 3.23
Support parsing from code.
use strict.
Added start_document and end_document events (as for SAX).
Implemented skipped_text argspec.
Fixed interaction between unbroken_text and skipped_text.
Implemented offset_end argspec.
Doc update. Release 3.24.
Test offset_end.
Release 3.24.
Fix plaintext parsing.
<plaintext> fixed.
Some more state that was not reset on EOF.
perl5.004_04 did not have ERRSV
croak(0) was not present for 5.6.0
From: "Stephane Barizien" <sba at ocegr.fr>
Release 3.25
Don't encode \r as suggested by Sean M. Burke.
Make 'make clean' also clean up generated *.h files
From: "Timur I. Bakeyev" <timur at gnu.org>
Another example program.
Avoid warnings emitted by perl-5.7.3
From: Guy Albertelli II <guy at albertelli.com>
Added a few tests. Resorted.
More doc updates explaining C<case sensitive>
Calling perl_call_* without G_EVAL always means trouble.
Dont get fooled by an emtpy http-equiv
We already had a RETHROW macro defined.
Release 3.26
First revision.
Added eg/hlc to the example programs.
Typo spotted by Marc Lehmann <pcg at goof.com>.
Typo.
From: "Sean M. Burke" <sburke at cpan.org>
Test encode_entities_numeric
Release 3.27
Fixed typo. Spotted by Sean.
Pass context around instead of using dTHX; This should be faster.
Make <!454554> be treated as a comment unless strict_comment is enabled.
Version 3.28.
avoid Visual C warning. Patch by gsar at activestate.com.
Don't use the pfunc by default. On Intel P4 that saves about 3000 bytes on the binary but there was no easy to measure speed difference.
xml_mode implies strict_names also for end tags.
64-bit fix from Doug Larrick <doug at ties.org>
Documentation patch: <textarea> is also literal mode.
MSIE compatibility stuff.
Need <!-- for strange <script> behaviour to show up.
Allow crap in end tags as MSIE does.
The name token name 'empty' was not good.
Parse <! "<>"> as comment (MSIE compat).
Implement 'strict_end' to control acceptance of junk at the end of end tags.
Parse with <--comments> like this if we can't find the real thing.
Release 3.29.
From: Steve Hay <steve.hay at uk.radan.com>
Avoid RETVAL warnings as reported by Steve Hay <steve.hay at uk.radan.com>
Perl-5.7 should be gone by now.
Better fix for the RETVAL warnings. Use PPCODE for the parse functions.
Missing unicode support noted.
Also PPCODify handler(). Fixed return value for eof().
The assert() apparently needs my_perl so ignore it.
Documentation: Don't reference perl 5.7 any more.
Release 3.30.
Release 3.31
Stale stuff.
If the document ends with "some kind of unterminated markup", then
http://rt.cpan.org/Ticket/Display.html?id=3954
Show skipped reason in the official way.
Updated documentation.
Include $Id$.
Let the get_text() and get_trimmed_text() methods take multiple
Document the </script> inside quotes case as a BUG.
Typo spotted by S Page <spage at macromedia.com>
Apply patch (partly) from S Page <spage at macromedia.com> that adds some comments.
Note that parsing of Unicode does not work yet.
Added dump script.
Release 3.32.
Implement get_phrase().
Make get_text() expand most skipped tags to " "
We don't support 5.004 any more. For some strange reason the
Release 3.33
Fix release date for 3.33
Avoid core dump when the stack get reallocated during the parse() call.
Added testcase for the stack realloc bug to the test suite.
Release 3.34
No need to redeclare SP.
From: "Croome, Paul" <Paul.Croome at softwareag.com>
Release 3.35
When an attribute occurs use the first one in 'attr' instead of
Compute hash only once.
Release 3.36
Silence 'gcc -Wall' - the prev_token might be a real issue.
Time to ditch the v2 synopsis.
Improve the handling of surrogate pairs. Based on patch by
Match perl's rules for Unicode non-chars.
Avoid temp modification of argspec strings.
Must also upgrade chars after the gap. Otherwise we might produce
Release 3.37
Make closing of <plaintext> configurable.
Release 3.38
Typo.
Parse <title> in literal mode.
Updated copyright year.
Make the UTF8-ness of strings parsed propagate.
Disable Unicode stuff for perl < 5.8. I still want HTML-Parser
Get offsets right for Unicode string.
Removed Unicode noop.
Test Unicode parsing behaviour.
Don't consider perl-5.6 Unicode capable.
Release 3.39_90
Usually there is only one <title>.
Unicode basically done.
Convert to use Test.pm
Header is not done if we see the Unicode BOM.
Unicode is not supported.
Unicode BOM tests.
UTF-8 BOM warning only when Unicode is avalable.
BOM tests.
Some behaviour seen in KHTML sources.
Implement quote behaviour for <script> tags.
Test quote behaviour.
Propagate UTF-8-ness during flushing at eot.
If literal tags are unterminated, flush them out with the text
Make Unicode BOM warnings optional and document them.
This change was supposed to go somewhere else.
Document that these modules need decoded chars to parse.
Release 3.39_91
Some new MSIE comptibility issues.
MSIE compatibility: Expand unterminated entities in 'dtext' and
Improve decode_entities() documentation.
Tweaks.
Simplify.
Test parsing of Unicode from file.
Try to describe Unicode issues better.
Added attribute 'utf8_mode'.
Sort documentation; boolean attributes, argspecs, events.
Test utf8_mode.
Fix utf8_mode semantics. The entities are now decoded as UTF-8.
Release 3.39_92.
Simpler HTML link.
Trigger UTF8 warning if anything in the first chunk looks like hibit UTF8.
The utf8_mode produce garbage for older perls.
Least expensive tests first.
Release 3.40.
Make it work with perl-5.005
Release 3.41
Use push_header for all headers added. Do not want to loose any values. Better to duplicate fields.
Silence warnings from the HP C compiler about char/U8 mismatches.
Typo in r2.26
Avoid sv_catpvn_utf8_upgrade; make us perl-5.8.0 compatible.
perl-5.8.0 does not have utf8::is_utf8.
Release 3.42.
Fix test failure on Windows.
Forgot to set repl_utf8 flag which might lead to utf8 corruption.
Release 3.43
Fix the handling of quoted strings.
Release 3.44.
Fix stack leak.
Release 3.45.
Explain affected code.
From APEE build log with the HP native C compiler.
Fix typo spotted by Stefan Funke <bundy at adm.arcor.net>.
From: Norbert Kiesel <nkiesel at tbdnetworks.com>
Test pod correctness and fix up missing =back.
use strict;
Don't treat 0xA0 as space, since it's not really and XML agrees.
Try parsing of \x0420.
Release 3.46
From: Norbert Kiesel <nkiesel at tbdnetworks.com>
Make unbroken_text the default for HTML::TokeParser.
Silence all the diag noise.
Skip blocks needs to be called SKIP for it to work.
perl-5.8.0 is just too buggy for HTML-Parser.
Faster load time with XSLoader.
Make the source ASCII only.
Better use of Test::More.
An explicit binmode() make this test pass with perl-5.8.0
encode &apos by default.
Make tests pass for perl-5.6.
It seems to work with perl-5.8.0 now.
Typos.
Add empty_element_tag and xml_pic attributes.
xml_pic has been added
Need to look for '/>' in more places when strict_names isn't enabled.
Make empty_element_tag default on for HTML::TokeParser.
Documentation tweaks.
Add some empty elements tests.
Rename as empty_element_tags (with s)
Release 3.47.
Test empty_element_tags/xml_pic.
Fix typo.
Don't enable empty_element_tags by default. It breaks HTML::Form :(
Adjust token counts now that empty_element_tags is not the default.
marked_sections omit first 3 bytes "<![" from "skipped_text"
perl 5.6 is required.
Release 3.48
First revision.
Events could still fire after a handler has signaled eof.
marked_sections with text ending in square bracket parsed wrong
Release 3.49.
Updated copyright year.
From: Steve Hay <steve.hay at uk.radan.com>
Release 3.50.
Typos spotted by william at knowmad.com.
Improved MSIE compatibility. Only the Latin-1 entities
First revision.
More tests.
One more ref.
Updated documentation.
Release 3.51.
Typo fixes are also in 3.51.
Bye.
Add some results.
Link to search.cpan.org.
Added HTML-Parser to the result table.
Safari results.
Documentation typo fix.
Make sure 'start_document' is triggered exactly once per document.
Documentation tweaks. Recommend empty_element_tags.
Documentation typo fixes.
Release 3.52.
ignore_element treated </script> like <script>.
Release 3.53.
Enabling of empty_element_tag interacted badly with literal mode.
Release 3.54.
Yaakov Belch was responsible for release 3.53 and 3.54.
Test that empty_element_tags works for <script/> too.
Consider <!a'b> a comment by itself.
From: Gisle Aas <gisle at ActiveState.com>
Treat <> at end as text.
Test <!a'b> comments.
Release 3.55.
Support threads cloning. Contributed by Bo Lindbergh.
New test file.
Release 3.56.
Restore perl-5.6 compatiblity.
New year.
Remove debug printout.
State Test::More dependency.
Don't require whitespace between declaration tokens.
Extra plaintext test from Alex Kapranoff <kappa at rambler-co.ru>.
Alex Kapranoff claims the closing_plaintext behaviour only occured
Implement backquote() attribute as requested by Alex Kapranoff.
-----------------------------------------------------------------------
No new revisions were added by this update.
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/pkg-perl/packages/libhtml-parser-perl.git
More information about the Pkg-perl-cvs-commits
mailing list