Changes between Initial Version and Version 2 of Ticket #8468


Ignore:
Timestamp:
10/29/2016 05:43:04 PM (8 years ago)
Author:
Douglas R. Reno
Comment:

Added changes

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #8468

    • Property Owner changed from blfs-book@… to bdubbs@…
    • Property Status newassigned
  • Ticket #8468 – Description

    initial v2  
    11New point version
     2
     3
     4{{{
     5Xapian-core 1.4.1 (2016-10-21):
     6
     7API:
     8
     9* Constructing a Query for a non-reference counted PostingSource object will
     10  now try to clone the PostingSource object (as happened in 1.3.4 and
     11  earlier).  This clone code was removed as part of the changes in 1.3.5 to
     12  support optional reference counting of PostingSource objects, but that breaks
     13  the case when the PostingSource object is on the stack and goes out of scope
     14  before the Query object is used.  Issue reported by Till Schäfer and analysed
     15  by Daniel Vrátil in a bug report against Akonadi:
     16  https://bugs.kde.org/show_bug.cgi?id=363741
     17
     18* Add BM25PlusWeight class implementing the BM25+ weighting scheme, implemented
     19  by Vivek Pal (https://github.com/xapian/xapian/pull/104).
     20
     21* Add PL2PlusWeight class implementing the PL2+ weighting scheme, implemented
     22  by Vivek Pal (https://github.com/xapian/xapian/pull/108).
     23
     24* LMWeight: Implement Dir+ weighting scheme as DIRICHLET_PLUS_SMOOTHING.
     25  Patch from Vivek Pal.
     26
     27* Add CoordWeight class implementing coordinate matching.  This can be useful
     28  for specialised uses - e.g. to implement sorting by the number of matching
     29  filters.
     30
     31* DLHWeight,DPHWeight,PL2Weight: With these weighting schemes, the formulae
     32  can give a negative weight contribution for a term in extreme cases.  We
     33  used to try to handle this by calculating a per-term lower bound on the
     34  contribution and subtracting this from the contribution, but this idea
     35  is fundamentally flawed as the total offset it adds to a document depends on
     36  what combination of terms that document matches, meaning in general the
     37  offset isn't the same for every matching document.  So instead we now clamp
     38  each term's weight contribution to be >= 0.
     39
     40* TfIdfWeight: Always scale term weight by wqf - this seems the logical
     41  approach as it matches the weighting we'd get if we weighted every non-unique
     42  term in the query, as well as being explicit in the Piv+ formula.
     43
     44* Fix OP_SCALE_WEIGHT to work with all weighting schemes - previously it was
     45  ignored when using PL2Weight and LMWeight.
     46
     47* PL2Weight: Greatly improve upper bound on weight:
     48  + Split the weight equation into two parts and maximise each separately as
     49    that gives an easily solvable problem, and in common cases the maximum is
     50    at the same value of wdfn for both parts.  In a simple test, the upper
     51    bounds are now just over double the highest weight actually achieved -
     52    previously they were several hundred times.  This approach was suggested by
     53    Aarsh Shah in: https://github.com/xapian/xapian/pull/48
     54  + Improve upper bound on normalised wdf (wdfn) - when wdf_upper_bound >
     55    doclength_lower_bound, we get a tighter bound by evaluating at
     56    wdf=wdf_upper_bound.  In a simple test, this reduces the upper bound on
     57    wdfn by 36-64%, and the upper bound on the weight by 9-33%.
     58
     59* PL2Weight: Fix calculation of upper_bound when P2>0.  P2 is typically
     60  negative, but for a very common term it can be positive and then we should
     61  use wdfn_lower not wdfn_upper to adjust P_max.
     62
     63* Weight::unserialise(): Check serialised form is empty when unserialising
     64  parameter-free schemes BoolWeight, DLHWeight and DPHWeight.
     65
     66* TermGenerator::set_stopper_strategy(): New method to control how the Stopper
     67  object is used.  Patch from Arnav Jain.
     68
     69* QueryParser: Fix handling of CJK query over multiple prefixes.  Previously
     70  all the n-gram terms were AND-ed together - now we AND together for each
     71  prefix, then OR the results.  Fixes #719, reported by Aaron Li.
     72
     73* Add Database::get_revision() method which provides access to the database
     74  revision number for chert and glass, intended for use by xapiand.  Marked
     75  as experimental, so we don't have to go through the usual deprecation cycle
     76  if this proves not to be the approach we want to take.  Fixes #709,
     77  reported by German M. Bravo.
     78
     79* Mark RangeProcessor constructor as `explicit`.
     80
     81testsuite:
     82
     83* OP_SCALE_WEIGHT: Check top weight is non-zero - if it is zero, tests which
     84  try to check that OP_SCALE_WEIGHT works will always pass.
     85
     86* testsuite: Check SerialisationError descriptions from Xapian::Weight
     87  subclasses mention the weighting scheme name.
     88
     89matcher:
     90
     91* Fix stats passed to Weight with OP_SYNONYM.  Previously the number of
     92  unique terms was never calculated, and a term which matched all documents
     93  would be optimised to an all-docs postlist, which fails to supply the
     94  correct wdf info.
     95
     96* Use floating point calculation for OR synonym freq estimates.  The division
     97  was being done as an integer division, which means the result was always
     98  getting rounded down rather than rounded to the nearest integer.
     99
     100glass backend:
     101
     102* Fix allterms with prefix on glass with uncommitted changes.  Glass aims to
     103  flush just the relevant postlist changes in this case but the end of the
     104  range to flush was wrong, so we'd only actually flush changes for a term
     105  exactly matching the prefix.  Fixes #721.
     106
     107remote backend:
     108
     109* Improve handling of invalid remote stub entries: Entries without a colon now
     110  give an error rather than being quietly skipped; IPv6 isn't yet supported,
     111  but entries with IPv6 addresses now result in saner errors (previously the
     112  colons confused the code which looks for a port number).
     113
     114build system:
     115
     116* XO_LIB_XAPIAN: Check for user trying to specify configure for XAPIAN_CONFIG
     117  and give a more helpful error.
     118
     119* Fix XO_LIB_XAPIAN to work without libtool.  Modern versions of GNU m4 error
     120  out when defn is used on an undefined macro.  Uncovered by Amanda Jayanetti.
     121
     122* Clean build paths out of installed xapian-config, mostly in the interests of
     123  facilitating reproducible builds, but it is also a little more robust as the
     124  "uninstalled tree" case can't then accidentally be triggered.
     125
     126* Drop compiler options that are no longer useful:
     127  + -fshow-column is the default in all GCC versions we now support
     128    (checked as GCC 4.6).
     129  + -Wno-long-long is no longer necessary now that we require C++11 where
     130    "long long" is a standard type.
     131
     132documentation:
     133
     134* Add API documentation comments for all classes, methods, constants, etc which
     135  were lacking them, and improve the content of some existing comments.
     136
     137* Stop hiding undocumented classes and members.  Hiding them silences doxygen's
     138  warnings about them, so it's hard to see what is missing, and the stub
     139  documentation produced is perhaps better than not documenting at all.
     140  Fixes #736, reported by James Aylett.
     141
     142* xapian-check: Make command line syntax consistent with other tools.
     143
     144* Note when MSet::snippet() was added.
     145
     146* deprecation.rst: Recommend unsigned over useconds_t for timeout values (but
     147  leave the API using useconds_t for 1.4.x for ABI compatibility.  The type
     148  useconds_t is now obsolete and anyway was intended to represent a time in
     149  microseconds (confusing when Xapian's timeouts are in milliseconds).  The
     150  Linux usleep man page notes: "Programs will be more portable if they never
     151  mention this type explicitly."
     152
     153portability:
     154
     155* Suppress compiler warnings about pointer alignment on some architectures.
     156  We know the data is aligned in these cases.
     157
     158* Fix replicate7 under Cygwin.
     159
     160debug code:
     161
     162* Add missing forward declaration needed by --enable-log build.
     163
     164}}}