Opened 8 years ago
Last modified 8 years ago
#8468 closed enhancement
xapian-core-1.4.1 — at Version 2
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | normal | Milestone: | 8.0 |
Component: | BOOK | Version: | SVN |
Severity: | normal | Keywords: | |
Cc: |
Description (last modified by ) ¶
New point version
Xapian-core 1.4.1 (2016-10-21): API: * Constructing a Query for a non-reference counted PostingSource object will now try to clone the PostingSource object (as happened in 1.3.4 and earlier). This clone code was removed as part of the changes in 1.3.5 to support optional reference counting of PostingSource objects, but that breaks the case when the PostingSource object is on the stack and goes out of scope before the Query object is used. Issue reported by Till Schäfer and analysed by Daniel Vrátil in a bug report against Akonadi: https://bugs.kde.org/show_bug.cgi?id=363741 * Add BM25PlusWeight class implementing the BM25+ weighting scheme, implemented by Vivek Pal (https://github.com/xapian/xapian/pull/104). * Add PL2PlusWeight class implementing the PL2+ weighting scheme, implemented by Vivek Pal (https://github.com/xapian/xapian/pull/108). * LMWeight: Implement Dir+ weighting scheme as DIRICHLET_PLUS_SMOOTHING. Patch from Vivek Pal. * Add CoordWeight class implementing coordinate matching. This can be useful for specialised uses - e.g. to implement sorting by the number of matching filters. * DLHWeight,DPHWeight,PL2Weight: With these weighting schemes, the formulae can give a negative weight contribution for a term in extreme cases. We used to try to handle this by calculating a per-term lower bound on the contribution and subtracting this from the contribution, but this idea is fundamentally flawed as the total offset it adds to a document depends on what combination of terms that document matches, meaning in general the offset isn't the same for every matching document. So instead we now clamp each term's weight contribution to be >= 0. * TfIdfWeight: Always scale term weight by wqf - this seems the logical approach as it matches the weighting we'd get if we weighted every non-unique term in the query, as well as being explicit in the Piv+ formula. * Fix OP_SCALE_WEIGHT to work with all weighting schemes - previously it was ignored when using PL2Weight and LMWeight. * PL2Weight: Greatly improve upper bound on weight: + Split the weight equation into two parts and maximise each separately as that gives an easily solvable problem, and in common cases the maximum is at the same value of wdfn for both parts. In a simple test, the upper bounds are now just over double the highest weight actually achieved - previously they were several hundred times. This approach was suggested by Aarsh Shah in: https://github.com/xapian/xapian/pull/48 + Improve upper bound on normalised wdf (wdfn) - when wdf_upper_bound > doclength_lower_bound, we get a tighter bound by evaluating at wdf=wdf_upper_bound. In a simple test, this reduces the upper bound on wdfn by 36-64%, and the upper bound on the weight by 9-33%. * PL2Weight: Fix calculation of upper_bound when P2>0. P2 is typically negative, but for a very common term it can be positive and then we should use wdfn_lower not wdfn_upper to adjust P_max. * Weight::unserialise(): Check serialised form is empty when unserialising parameter-free schemes BoolWeight, DLHWeight and DPHWeight. * TermGenerator::set_stopper_strategy(): New method to control how the Stopper object is used. Patch from Arnav Jain. * QueryParser: Fix handling of CJK query over multiple prefixes. Previously all the n-gram terms were AND-ed together - now we AND together for each prefix, then OR the results. Fixes #719, reported by Aaron Li. * Add Database::get_revision() method which provides access to the database revision number for chert and glass, intended for use by xapiand. Marked as experimental, so we don't have to go through the usual deprecation cycle if this proves not to be the approach we want to take. Fixes #709, reported by German M. Bravo. * Mark RangeProcessor constructor as `explicit`. testsuite: * OP_SCALE_WEIGHT: Check top weight is non-zero - if it is zero, tests which try to check that OP_SCALE_WEIGHT works will always pass. * testsuite: Check SerialisationError descriptions from Xapian::Weight subclasses mention the weighting scheme name. matcher: * Fix stats passed to Weight with OP_SYNONYM. Previously the number of unique terms was never calculated, and a term which matched all documents would be optimised to an all-docs postlist, which fails to supply the correct wdf info. * Use floating point calculation for OR synonym freq estimates. The division was being done as an integer division, which means the result was always getting rounded down rather than rounded to the nearest integer. glass backend: * Fix allterms with prefix on glass with uncommitted changes. Glass aims to flush just the relevant postlist changes in this case but the end of the range to flush was wrong, so we'd only actually flush changes for a term exactly matching the prefix. Fixes #721. remote backend: * Improve handling of invalid remote stub entries: Entries without a colon now give an error rather than being quietly skipped; IPv6 isn't yet supported, but entries with IPv6 addresses now result in saner errors (previously the colons confused the code which looks for a port number). build system: * XO_LIB_XAPIAN: Check for user trying to specify configure for XAPIAN_CONFIG and give a more helpful error. * Fix XO_LIB_XAPIAN to work without libtool. Modern versions of GNU m4 error out when defn is used on an undefined macro. Uncovered by Amanda Jayanetti. * Clean build paths out of installed xapian-config, mostly in the interests of facilitating reproducible builds, but it is also a little more robust as the "uninstalled tree" case can't then accidentally be triggered. * Drop compiler options that are no longer useful: + -fshow-column is the default in all GCC versions we now support (checked as GCC 4.6). + -Wno-long-long is no longer necessary now that we require C++11 where "long long" is a standard type. documentation: * Add API documentation comments for all classes, methods, constants, etc which were lacking them, and improve the content of some existing comments. * Stop hiding undocumented classes and members. Hiding them silences doxygen's warnings about them, so it's hard to see what is missing, and the stub documentation produced is perhaps better than not documenting at all. Fixes #736, reported by James Aylett. * xapian-check: Make command line syntax consistent with other tools. * Note when MSet::snippet() was added. * deprecation.rst: Recommend unsigned over useconds_t for timeout values (but leave the API using useconds_t for 1.4.x for ABI compatibility. The type useconds_t is now obsolete and anyway was intended to represent a time in microseconds (confusing when Xapian's timeouts are in milliseconds). The Linux usleep man page notes: "Programs will be more portable if they never mention this type explicitly." portability: * Suppress compiler warnings about pointer alignment on some architectures. We know the data is aligned in these cases. * Fix replicate7 under Cygwin. debug code: * Add missing forward declaration needed by --enable-log build.
Change History (2)
comment:1 by , 8 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:2 by , 8 years ago
Description: | modified (diff) |
---|
Note:
See TracTickets
for help on using tickets.
Added changes