Opened 8 years ago

Closed 7 years ago

Last modified 7 years ago

#8468 closed enhancement (fixed)

xapian-core-1.4.1

Reported by: bdubbs@… Owned by: bdubbs@…
Priority: normal Milestone: 8.0
Component: BOOK Version: SVN
Severity: normal Keywords:
Cc:

Description (last modified by Douglas R. Reno)

New point version

Xapian-core 1.4.1 (2016-10-21):

API:

* Constructing a Query for a non-reference counted PostingSource object will
  now try to clone the PostingSource object (as happened in 1.3.4 and
  earlier).  This clone code was removed as part of the changes in 1.3.5 to
  support optional reference counting of PostingSource objects, but that breaks
  the case when the PostingSource object is on the stack and goes out of scope
  before the Query object is used.  Issue reported by Till Schäfer and analysed
  by Daniel Vrátil in a bug report against Akonadi:
  https://bugs.kde.org/show_bug.cgi?id=363741

* Add BM25PlusWeight class implementing the BM25+ weighting scheme, implemented
  by Vivek Pal (https://github.com/xapian/xapian/pull/104).

* Add PL2PlusWeight class implementing the PL2+ weighting scheme, implemented
  by Vivek Pal (https://github.com/xapian/xapian/pull/108).

* LMWeight: Implement Dir+ weighting scheme as DIRICHLET_PLUS_SMOOTHING.
  Patch from Vivek Pal.

* Add CoordWeight class implementing coordinate matching.  This can be useful
  for specialised uses - e.g. to implement sorting by the number of matching
  filters.

* DLHWeight,DPHWeight,PL2Weight: With these weighting schemes, the formulae
  can give a negative weight contribution for a term in extreme cases.  We
  used to try to handle this by calculating a per-term lower bound on the
  contribution and subtracting this from the contribution, but this idea
  is fundamentally flawed as the total offset it adds to a document depends on
  what combination of terms that document matches, meaning in general the
  offset isn't the same for every matching document.  So instead we now clamp
  each term's weight contribution to be >= 0.

* TfIdfWeight: Always scale term weight by wqf - this seems the logical
  approach as it matches the weighting we'd get if we weighted every non-unique
  term in the query, as well as being explicit in the Piv+ formula.

* Fix OP_SCALE_WEIGHT to work with all weighting schemes - previously it was
  ignored when using PL2Weight and LMWeight.

* PL2Weight: Greatly improve upper bound on weight:
  + Split the weight equation into two parts and maximise each separately as
    that gives an easily solvable problem, and in common cases the maximum is
    at the same value of wdfn for both parts.  In a simple test, the upper
    bounds are now just over double the highest weight actually achieved -
    previously they were several hundred times.  This approach was suggested by
    Aarsh Shah in: https://github.com/xapian/xapian/pull/48
  + Improve upper bound on normalised wdf (wdfn) - when wdf_upper_bound >
    doclength_lower_bound, we get a tighter bound by evaluating at
    wdf=wdf_upper_bound.  In a simple test, this reduces the upper bound on
    wdfn by 36-64%, and the upper bound on the weight by 9-33%.

* PL2Weight: Fix calculation of upper_bound when P2>0.  P2 is typically
  negative, but for a very common term it can be positive and then we should
  use wdfn_lower not wdfn_upper to adjust P_max.

* Weight::unserialise(): Check serialised form is empty when unserialising
  parameter-free schemes BoolWeight, DLHWeight and DPHWeight.

* TermGenerator::set_stopper_strategy(): New method to control how the Stopper
  object is used.  Patch from Arnav Jain.

* QueryParser: Fix handling of CJK query over multiple prefixes.  Previously
  all the n-gram terms were AND-ed together - now we AND together for each
  prefix, then OR the results.  Fixes #719, reported by Aaron Li.

* Add Database::get_revision() method which provides access to the database
  revision number for chert and glass, intended for use by xapiand.  Marked
  as experimental, so we don't have to go through the usual deprecation cycle
  if this proves not to be the approach we want to take.  Fixes #709,
  reported by German M. Bravo.

* Mark RangeProcessor constructor as `explicit`.

testsuite:

* OP_SCALE_WEIGHT: Check top weight is non-zero - if it is zero, tests which
  try to check that OP_SCALE_WEIGHT works will always pass.

* testsuite: Check SerialisationError descriptions from Xapian::Weight
  subclasses mention the weighting scheme name.

matcher:

* Fix stats passed to Weight with OP_SYNONYM.  Previously the number of
  unique terms was never calculated, and a term which matched all documents
  would be optimised to an all-docs postlist, which fails to supply the
  correct wdf info.

* Use floating point calculation for OR synonym freq estimates.  The division
  was being done as an integer division, which means the result was always
  getting rounded down rather than rounded to the nearest integer.

glass backend:

* Fix allterms with prefix on glass with uncommitted changes.  Glass aims to
  flush just the relevant postlist changes in this case but the end of the
  range to flush was wrong, so we'd only actually flush changes for a term
  exactly matching the prefix.  Fixes #721.

remote backend:

* Improve handling of invalid remote stub entries: Entries without a colon now
  give an error rather than being quietly skipped; IPv6 isn't yet supported,
  but entries with IPv6 addresses now result in saner errors (previously the
  colons confused the code which looks for a port number).

build system:

* XO_LIB_XAPIAN: Check for user trying to specify configure for XAPIAN_CONFIG
  and give a more helpful error.

* Fix XO_LIB_XAPIAN to work without libtool.  Modern versions of GNU m4 error
  out when defn is used on an undefined macro.  Uncovered by Amanda Jayanetti.

* Clean build paths out of installed xapian-config, mostly in the interests of
  facilitating reproducible builds, but it is also a little more robust as the
  "uninstalled tree" case can't then accidentally be triggered.

* Drop compiler options that are no longer useful:
  + -fshow-column is the default in all GCC versions we now support
    (checked as GCC 4.6).
  + -Wno-long-long is no longer necessary now that we require C++11 where
    "long long" is a standard type.

documentation:

* Add API documentation comments for all classes, methods, constants, etc which
  were lacking them, and improve the content of some existing comments.

* Stop hiding undocumented classes and members.  Hiding them silences doxygen's
  warnings about them, so it's hard to see what is missing, and the stub
  documentation produced is perhaps better than not documenting at all.
  Fixes #736, reported by James Aylett.

* xapian-check: Make command line syntax consistent with other tools.

* Note when MSet::snippet() was added.

* deprecation.rst: Recommend unsigned over useconds_t for timeout values (but
  leave the API using useconds_t for 1.4.x for ABI compatibility.  The type
  useconds_t is now obsolete and anyway was intended to represent a time in
  microseconds (confusing when Xapian's timeouts are in milliseconds).  The
  Linux usleep man page notes: "Programs will be more portable if they never
  mention this type explicitly."

portability:

* Suppress compiler warnings about pointer alignment on some architectures.
  We know the data is aligned in these cases.

* Fix replicate7 under Cygwin.

debug code:

* Add missing forward declaration needed by --enable-log build.

Change History (4)

comment:1 by bdubbs@…, 7 years ago

Owner: changed from blfs-book@… to bdubbs@…
Status: newassigned

comment:2 by Douglas R. Reno, 7 years ago

Description: modified (diff)

Added changes

comment:3 by bdubbs@…, 7 years ago

Resolution: fixed
Status: assignedclosed

Fixed at revision 17923.

comment:4 by bdubbs@…, 7 years ago

Milestone: 7.118.0

Milestone renamed

Note: See TracTickets for help on using tickets.