Hi,

Looking for advice on debugging issues with full text search with tracker. By and large, it seems to be basically working other than the results in Gnome shell coming back as Untitled Document (https://bugzilla.gnome.org/show_bug.cgi?id=789006). But I’ve noticed that some of my PDFs are missing words from their full text indexes, and I can’t work out why, or how to debug the issue.

If I search for the document by name in Documents, it appears as expected; if do tracker info <FILE> for it, I get a result with all the various properties + all of the text listed under nie:plainTextContent. Stranger still, if I do tracker search <TERM> with some words that appear in the document, it turns up - but with others (including ones that definitely appear in the nie:plainTextContent field, and which are found by searching in the document with DocumentViewer) it doesn’t get returned. It looks like the text content of the document is only being partially indexed for some reason.

Things I’ve tried:

  • tracker reset -f <FILE> whilst watching with tracker daemon -w. This shows the document being reindexed but nothing else particularly enlightening
  • Changing ‘max-words-to-index’ to 100,000 using dconf-editor - no difference in behaviour
  • Attempting to do tracker extract -v debug <FILE> directly on the file; this just gives me a file not found error for /usr/lib/tracker-extract - which indeed doesn’t exist. Not sure what’s going on there.

Any suggestions for how to progress here?