| 2 | |
| 3 | |
| 4 | {{{ |
| 5 | Xapian-core 1.4.1 (2016-10-21): |
| 6 | |
| 7 | API: |
| 8 | |
| 9 | * Constructing a Query for a non-reference counted PostingSource object will |
| 10 | now try to clone the PostingSource object (as happened in 1.3.4 and |
| 11 | earlier). This clone code was removed as part of the changes in 1.3.5 to |
| 12 | support optional reference counting of PostingSource objects, but that breaks |
| 13 | the case when the PostingSource object is on the stack and goes out of scope |
| 14 | before the Query object is used. Issue reported by Till Schäfer and analysed |
| 15 | by Daniel Vrátil in a bug report against Akonadi: |
| 16 | https://bugs.kde.org/show_bug.cgi?id=363741 |
| 17 | |
| 18 | * Add BM25PlusWeight class implementing the BM25+ weighting scheme, implemented |
| 19 | by Vivek Pal (https://github.com/xapian/xapian/pull/104). |
| 20 | |
| 21 | * Add PL2PlusWeight class implementing the PL2+ weighting scheme, implemented |
| 22 | by Vivek Pal (https://github.com/xapian/xapian/pull/108). |
| 23 | |
| 24 | * LMWeight: Implement Dir+ weighting scheme as DIRICHLET_PLUS_SMOOTHING. |
| 25 | Patch from Vivek Pal. |
| 26 | |
| 27 | * Add CoordWeight class implementing coordinate matching. This can be useful |
| 28 | for specialised uses - e.g. to implement sorting by the number of matching |
| 29 | filters. |
| 30 | |
| 31 | * DLHWeight,DPHWeight,PL2Weight: With these weighting schemes, the formulae |
| 32 | can give a negative weight contribution for a term in extreme cases. We |
| 33 | used to try to handle this by calculating a per-term lower bound on the |
| 34 | contribution and subtracting this from the contribution, but this idea |
| 35 | is fundamentally flawed as the total offset it adds to a document depends on |
| 36 | what combination of terms that document matches, meaning in general the |
| 37 | offset isn't the same for every matching document. So instead we now clamp |
| 38 | each term's weight contribution to be >= 0. |
| 39 | |
| 40 | * TfIdfWeight: Always scale term weight by wqf - this seems the logical |
| 41 | approach as it matches the weighting we'd get if we weighted every non-unique |
| 42 | term in the query, as well as being explicit in the Piv+ formula. |
| 43 | |
| 44 | * Fix OP_SCALE_WEIGHT to work with all weighting schemes - previously it was |
| 45 | ignored when using PL2Weight and LMWeight. |
| 46 | |
| 47 | * PL2Weight: Greatly improve upper bound on weight: |
| 48 | + Split the weight equation into two parts and maximise each separately as |
| 49 | that gives an easily solvable problem, and in common cases the maximum is |
| 50 | at the same value of wdfn for both parts. In a simple test, the upper |
| 51 | bounds are now just over double the highest weight actually achieved - |
| 52 | previously they were several hundred times. This approach was suggested by |
| 53 | Aarsh Shah in: https://github.com/xapian/xapian/pull/48 |
| 54 | + Improve upper bound on normalised wdf (wdfn) - when wdf_upper_bound > |
| 55 | doclength_lower_bound, we get a tighter bound by evaluating at |
| 56 | wdf=wdf_upper_bound. In a simple test, this reduces the upper bound on |
| 57 | wdfn by 36-64%, and the upper bound on the weight by 9-33%. |
| 58 | |
| 59 | * PL2Weight: Fix calculation of upper_bound when P2>0. P2 is typically |
| 60 | negative, but for a very common term it can be positive and then we should |
| 61 | use wdfn_lower not wdfn_upper to adjust P_max. |
| 62 | |
| 63 | * Weight::unserialise(): Check serialised form is empty when unserialising |
| 64 | parameter-free schemes BoolWeight, DLHWeight and DPHWeight. |
| 65 | |
| 66 | * TermGenerator::set_stopper_strategy(): New method to control how the Stopper |
| 67 | object is used. Patch from Arnav Jain. |
| 68 | |
| 69 | * QueryParser: Fix handling of CJK query over multiple prefixes. Previously |
| 70 | all the n-gram terms were AND-ed together - now we AND together for each |
| 71 | prefix, then OR the results. Fixes #719, reported by Aaron Li. |
| 72 | |
| 73 | * Add Database::get_revision() method which provides access to the database |
| 74 | revision number for chert and glass, intended for use by xapiand. Marked |
| 75 | as experimental, so we don't have to go through the usual deprecation cycle |
| 76 | if this proves not to be the approach we want to take. Fixes #709, |
| 77 | reported by German M. Bravo. |
| 78 | |
| 79 | * Mark RangeProcessor constructor as `explicit`. |
| 80 | |
| 81 | testsuite: |
| 82 | |
| 83 | * OP_SCALE_WEIGHT: Check top weight is non-zero - if it is zero, tests which |
| 84 | try to check that OP_SCALE_WEIGHT works will always pass. |
| 85 | |
| 86 | * testsuite: Check SerialisationError descriptions from Xapian::Weight |
| 87 | subclasses mention the weighting scheme name. |
| 88 | |
| 89 | matcher: |
| 90 | |
| 91 | * Fix stats passed to Weight with OP_SYNONYM. Previously the number of |
| 92 | unique terms was never calculated, and a term which matched all documents |
| 93 | would be optimised to an all-docs postlist, which fails to supply the |
| 94 | correct wdf info. |
| 95 | |
| 96 | * Use floating point calculation for OR synonym freq estimates. The division |
| 97 | was being done as an integer division, which means the result was always |
| 98 | getting rounded down rather than rounded to the nearest integer. |
| 99 | |
| 100 | glass backend: |
| 101 | |
| 102 | * Fix allterms with prefix on glass with uncommitted changes. Glass aims to |
| 103 | flush just the relevant postlist changes in this case but the end of the |
| 104 | range to flush was wrong, so we'd only actually flush changes for a term |
| 105 | exactly matching the prefix. Fixes #721. |
| 106 | |
| 107 | remote backend: |
| 108 | |
| 109 | * Improve handling of invalid remote stub entries: Entries without a colon now |
| 110 | give an error rather than being quietly skipped; IPv6 isn't yet supported, |
| 111 | but entries with IPv6 addresses now result in saner errors (previously the |
| 112 | colons confused the code which looks for a port number). |
| 113 | |
| 114 | build system: |
| 115 | |
| 116 | * XO_LIB_XAPIAN: Check for user trying to specify configure for XAPIAN_CONFIG |
| 117 | and give a more helpful error. |
| 118 | |
| 119 | * Fix XO_LIB_XAPIAN to work without libtool. Modern versions of GNU m4 error |
| 120 | out when defn is used on an undefined macro. Uncovered by Amanda Jayanetti. |
| 121 | |
| 122 | * Clean build paths out of installed xapian-config, mostly in the interests of |
| 123 | facilitating reproducible builds, but it is also a little more robust as the |
| 124 | "uninstalled tree" case can't then accidentally be triggered. |
| 125 | |
| 126 | * Drop compiler options that are no longer useful: |
| 127 | + -fshow-column is the default in all GCC versions we now support |
| 128 | (checked as GCC 4.6). |
| 129 | + -Wno-long-long is no longer necessary now that we require C++11 where |
| 130 | "long long" is a standard type. |
| 131 | |
| 132 | documentation: |
| 133 | |
| 134 | * Add API documentation comments for all classes, methods, constants, etc which |
| 135 | were lacking them, and improve the content of some existing comments. |
| 136 | |
| 137 | * Stop hiding undocumented classes and members. Hiding them silences doxygen's |
| 138 | warnings about them, so it's hard to see what is missing, and the stub |
| 139 | documentation produced is perhaps better than not documenting at all. |
| 140 | Fixes #736, reported by James Aylett. |
| 141 | |
| 142 | * xapian-check: Make command line syntax consistent with other tools. |
| 143 | |
| 144 | * Note when MSet::snippet() was added. |
| 145 | |
| 146 | * deprecation.rst: Recommend unsigned over useconds_t for timeout values (but |
| 147 | leave the API using useconds_t for 1.4.x for ABI compatibility. The type |
| 148 | useconds_t is now obsolete and anyway was intended to represent a time in |
| 149 | microseconds (confusing when Xapian's timeouts are in milliseconds). The |
| 150 | Linux usleep man page notes: "Programs will be more portable if they never |
| 151 | mention this type explicitly." |
| 152 | |
| 153 | portability: |
| 154 | |
| 155 | * Suppress compiler warnings about pointer alignment on some architectures. |
| 156 | We know the data is aligned in these cases. |
| 157 | |
| 158 | * Fix replicate7 under Cygwin. |
| 159 | |
| 160 | debug code: |
| 161 | |
| 162 | * Add missing forward declaration needed by --enable-log build. |
| 163 | |
| 164 | }}} |