Sphinx manpages

Sphinx manpages indexer 1 Sphinxsearch 2.0.2 indexer Sphinxsearch fulltext index generator indexer --config CONFIGFILE --rotate --noprogress --quiet --all INDEX ... indexer --buildstops OUTPUTFILE COUNT --config CONFIGFILE --noprogress --quiet --all INDEX ... indexer --merge MAIN_INDEX DELTA_INDEX --config CONFIGFILE --rotate --noprogress --quiet Description Sphinx is a collection of programs that aim to provide high quality fulltext search. indexer is the first of the two principle tools as part of Sphinx. Invoked from either the command line directly, or as part of a larger script, indexer is solely responsible for gathering the data that will be searchable. The calling syntax for indexer is as follows: $ indexer [OPTIONS] [indexname1 [indexname2 [...]]] Essentially you would list the different possible indexes (that you would later make available to search) in sphinx.conf, so when calling indexer, as a minimum you need to be telling it what index (or indexes) you want to index. If sphinx.conf contained details on 2 indexes, mybigindex and mysmallindex, you could do the following: $ indexer mybigindex $ indexer mysmallindex mybigindex As part of the configuration file, sphinx.conf, you specify one or more indexes for your data. You might call indexer to reindex one of them, ad-hoc, or you can tell it to process all indexes - you are not limited to calling just one, or all at once, you can always pick some combination of the available indexes. Options The majority of the options for indexer are given in the configuration file, however there are some options you might need to specify on the command line as well, as they can affect how the indexing operation is performed. These options are: Tells indexer to update every index listed in sphinx.conf, instead of listing individual indexes. This would be useful in small configurations, or cron-type or maintenance jobs where the entire index set will get rebuilt each day, or week, or whatever period is best. Example usage: $ indexer --config /home/myuser/sphinx.conf --all outfile.txt NUM Reviews the index source, as if it were indexing the data, and produces a list of the terms that are being indexed. In other words, it produces a list of all the searchable terms that are becoming part of the index. Note; it does not update the index in question, it simply processes the data 'as if' it were indexing, including running queries defined with sql_query_pre or sql_query_post. outputfile.txt will contain the list of words, one per line, sorted by frequency with most frequent first, and NUM specifies the maximum number of words that will be listed; if sufficiently large to encompass every word in the index, only that many words will be returned. Such a dictionary list could be used for client application features around "Did you mean..." functionality, usually in conjunction with , below. Example: $ indexer myindex --buildstops word_freq.txt 1000 This would produce a document in the current directory, word_freq.txt with the 1,000 most common words in 'myindex', ordered by most common first. Note that the file will pertain to the last index indexed when specified with multiple indexes or (i.e. the last one listed in the configuration file) Used in pair with (and is ignored if is not specified). As provides the list of words used within the index, adds the quantity present in the index, which would be useful in establishing whether certain words should be considered stopwords if they are too prevalent. It will also help with developing "Did you mean..." features where you can how much more common a given word compared to another, similar one. Example: $ indexer myindex --buildstops word_freq.txt 1000 --buildfreqs This would produce the word_freq.txt as above, however after each word would be the number of times it occurred in the index in question. CONFIGRILE, CONFIGFILE Use the given file as configuration. Normally, it will look for sphinx.conf in the installation directory (e.g. /usr/local/sphinx/etc/sphinx.conf if installed into /usr/local/sphinx), followed by the current directory you are in when calling indexer from the shell. This is most of use in shared environments where the binary files are installed somewhere like /usr/local/sphinx/ but you want to provide users with the ability to make their own custom Sphinx set-ups, or if you want to run multiple instances on a single server. In cases like those you could allow them to create their own sphinx.conf files and pass them to indexer with this option. For example: $ indexer --config /home/myuser/sphinx.conf myindex FILE Dumps rows fetched by SQL source(s) into the specified file, in a MySQL compatible syntax. Resulting dumps are the exact representation of data as received by indexer and help to repeat indexing-time issues. DST-INDEX SRC-INDEX Physically merge together two indexes. For example if you have a main+delta scheme, where the main index rarely changes, but the delta index is rebuilt frequently, and would be used to combine the two. The operation moves from right to left - the contents of SRC-INDEX get examined and physically combined with the contents of DST-INDEX and the result is left in DST-INDEX. In pseudo-code, it might be expressed as: DST-INDEX += SRC-INDEX An example: $ indexer --merge main delta --rotate In the above example, where the main is the master, rarely modified index, and delta is the less frequently modified one, you might use the above to call indexer to combine the contents of the delta into the main index and rotate the indexes. ATTR MIN MAX Run the filter range given upon merging. Specifically, as the merge is applied to the destination index (as part of , and is ignored if is not specified), indexer will also filter the documents ending up in the destination index, and only documents will pass through the filter given will end up in the final index. This could be used for example, in an index where there is a 'deleted' attribute, where 0 means 'not deleted'. Such an index could be merged with:$ indexer --merge main delta --merge-dst-range deleted 0 0 Any documents marked as deleted (value 1) would be removed from the newly-merged destination index. It can be added several times to the command line, to add successive filters to the merge, all of which must be met in order for a document to become part of the final index. , Used in pair with . Usually when merging indexer uses kill-list of source index (i.e., the one which is merged into) as the filter to wipe out the matching docs from the destination index. At the same time the kill-list of the destination itself isn't touched at all. When using , (or it shorter form ) the indexer will not filter the dst-index docs with src-index killlist, but it will merge their kill-lists together, so the final result index will have the kill-list containing the merged source kill-lists. Don't display progress details as they occur; instead, the final status details (such as documents indexed, speed of indexing and so on are only reported at completion of indexing. In instances where the script is not being run on a console (or 'tty'), this will be on by default. Example usage: $ indexer --rotate --all --noprogress Prints out SQL queries that indexer sends to the database, along with SQL connection and disconnection events. That is useful to diagnose and fix problems with SQL sources. Tells indexer not to output anything, unless there is an error. Again, most used for cron-type, or other script jobs where the output is irrelevant or unnecessary, except in the event of some kind of error. Example usage: $ indexer --rotate --all --quiet Used for rotating indexes. Unless you have the situation where you can take the search function offline without troubling users, you will almost certainly need to keep search running whilst indexing new documents. creates a second index, parallel to the first (in the same place, simply including .new in the filenames). Once complete, indexer notifies searchd via sending the SIGHUP signal, and searchd will attempt to rename the indexes (renaming the existing ones to include .old and renaming the .new to replace them), and then start serving from the newer files. Depending on the setting of , there may be a slight delay in being able to search the newer indexes. Example usage: $ indexer --rotate --all is useful when you are rebuilding many big indexes, and want each one rotated into searchd as soon as possible. With , indexer will send a SIGHUP signal to searchd after succesfully completing the work on each index. (The default behavior is to send a single SIGHUP after all the indexes were built.) Guarantees that every row that caused problems indexing (duplicate, zero, or missing document ID; or file field IO issues; etc) will be reported. By default, this option is off, and problem summaries may be reported instead. Author Andrey Aksenoff (shodan@sphinxsearch.com). This manual page is written by Alexey Vinogradov (klirichek@sphinxsearch.com), using the one written by Christian Hofstaedtler ch+debian-packages@zeha.at for the Debian system (but may be used by others). Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation. On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL. See also searchd 1 , search 1 , indextool 1 , spelldump 1 Sphinx and it's programs are documented fully by the Sphinx reference manual available in /usr/share/doc/sphinxsearch. searchd 1 Sphinxsearch 2.0.2 searchd Sphinxsearch network daemon. searchd --config CONFIGFILE --cpustats --iostats --index INDEX --port PORT searchd --status --config CONFIGFILE --pidfile PIDFILE searchd --stop --config CONFIGFILE --pidfile PIDFILE Description Sphinx is a collection of programs that aim to provide high quality fulltext search. Searchd is the second of the two principle tools as part of Sphinx. searchd is the part of the system which actually handles searches; it functions as a server and is responsible for receiving queries, processing them and returning a dataset back to the different APIs for client applications. Unlike indexer, searchd is not designed to be run either from a regular script or command-line calling, but instead either as a daemon to be called from init.d (on Unix/Linux type systems) or to be called as a service (on Windows-type systems). so not all of the command line options will always apply, and so will be build-dependent. Options These programs follow the usual GNU command line syntax, with long options starting with two dashes (`-'). The options available to searchd on all builds are: CONFIGFILE, CONFIGFILE Tell searchd to use the given file as its configuration, just as with indexer. Force searchd into console mode; typically it will be running as a conventional server application, and will aim to dump information into the log files (as specified in sphinx.conf). Sometimes though, when debugging issues in the configuration or the daemon itself, or trying to diagnose hard-to-track-down problems, it may be easier to force it to dump information directly to the console/command line from which it is being called. Running in console mode also means that the process will not be forked (so searches are done in sequence) and logs will not be written to. (It should be noted that console mode is not the intended method for running searchd.) You can invoke it as such: $ searchd --config /home/myuser/sphinx.conf --console Used to provide actual CPU time report (in addition to wall time) in both query log file (for every given query) and status report (aggregated). It depends on clock_gettime() system call and might therefore be unavailable on certain systems. You might start searchd thus: $ searchd --config /home/myuser/sphinx.conf --cpustats , , , List all of the parameters that can be called in your particular build of searchd. INDEX, INDEX Serve only the specified index. Like , this is usually for debugging purposes; more long-term changes would generally be applied to the configuration file itself. Usage example: $ searchd --index myindex Used in conjuction with the logging options (the will need to have been activated in sphinx.conf) to provide more detailed information on a per-query basis as to the input/output operations carried out in the course of that query, with a slight performance hit and of course bigger logs. Further details are available under the query log format section. You might start searchd thus: $ searchd --config /home/myuser/sphinx.conf --iostats , ( address ":" port | port | path ) [ ":" protocol ] Works as , but allow you to specify not only the port, but full path, as IP address and port, or Unix-domain socket path, that searchd will listen on. Otherwords, you can specify either an IP address (or hostname) and port number, or just a port number, or Unix socket path. If you specify port number but not the address, searchd will listen on all network interfaces. Unix path is identified by a leading slash. As the last param you can also specify a protocol handler (listener) to be used for connections on this socket. Supported protocol values are 'sphinx' (Sphinx 0.9.x API protocol) and 'mysql41' (MySQL protocol used since 4.1 upto at least 5.1). Enable additional debug output in the daemon log. Should only be needed rarely, to assist with debugging issues that could not be easily reproduced on request. causes daemon to fire general debug messages. and points to 'verbose' and 'very verbose' debug info. The last could really flood your logfile. PIDFILE Explicitly state a PID file, where the process information is stored regarding searchd, used for inter-process communications (for example, indexer will need to know the PID to contact searchd for rotating indexes). Normally, searchd would use a PID if running in regular mode (i.e. not with ), but it is possible that you will be running it in console mode whilst the index is being updated and rotated, for which a PID file will be needed. Example: $ searchd --config /home/myuser/sphinx.conf --pidfile /home/myuser/sphinx.pid PORT, PORT Specify the port that searchd should listen on, usually for debugging purposes. This will usually default to , but sometimes you need to run it on a different port. Specifying it on the command line will override anything specified in the configuration file. The valid range is 0 to 65535, but ports numbered 1024 and below usually require a privileged account in order to run. Look also the option, it will give you more possibilities to tune here. An example of usage: $ searchd --port 9313 Query running searchd instance status, using the connection details from the (optionally) provided configuration file. It will try to connect to the running instance using the first configured UNIX socket or TCP port. On success, it will query for a number of status and performance counter values and print them. You can use Status() API call to access the very same counters from your application. Examples: $ searchd --status $ searchd --config /home/myuser/sphinx.conf --status Asynchronously stop searchd, using the details of the PID file as specified in the sphinx.conf file, so you may also need to confirm to searchd which configuration file to use with the option. NB, calling will also make sure any changes applied to the indexes with UpdateAttributes() will be applied to the index files themselves. Example: $ searchd --config /home/myuser/sphinx.conf --stop Synchronously stop searchd. essentially tells the running instance to exit (by sending it a SIGTERM) and then immediately returns. will also attempt to wait until the running searchd instance actually finishes the shutdown (eg. saves all the pending attribute changes) and exits. Example: $ searchd --config /home/myuser/sphinx.conf --stopwait Possible exit codes are as follows: 0 on success; 1 if connection to running searchd daemon failed; 2 if daemon reported an error during shutdown; 3 if daemon crashed during shutdown Strip the path names from all the file names referenced from the index (stopwords, wordforms, exceptions, etc). This is useful for picking up indexes built on another machine with possibly different path layouts. Signals Last but not least, as every other daemon, searchd supports a number of signals. SIGTERM Initiates a clean shutdown. New queries will not be handled; but queries that are already started will not be forcibly interrupted. SIGHUP Initiates index rotation. Depending on the value of setting, new queries might be shortly stalled; clients will receive temporary errors. SIGUSR1 Forces reopen of searchd log and query log files, letting you implement log file rotation. Author Andrey Aksenoff (shodan@sphinxsearch.com). This manual page is written by Alexey Vinogradov (klirichek@sphinxsearch.com), using the one written by Christian Hofstaedtler ch+debian-packages@zeha.at for the Debian system (but may be used by others). Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation. On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL. See also indexer 1 , search 1 , indextool 1 Sphinx and it's programs are documented fully by the Sphinx reference manual available in /usr/share/doc/sphinxsearch. search 1 Sphinxsearch 2.0.2 search Sphinxsearch command-line index query search OPTIONS word1 word2 word3 ... Description Sphinx is a collection of programs that aim to provide high quality fulltext search. search is one of the helper tools within the Sphinx package. Whereas searchd is responsible for searches in a server-type environment, search is aimed at testing the index from the command line, and testing the index quickly without building a framework to make the connection to the server and process its response. Note: search is not intended to be deployed as part of a client application; it is strongly recommended you do not write an interface to search instead of searchd, and none of the bundled client APIs support this method. (In any event, search will reload files each time, whereas searchd will cache them in memory for performance.) That said, many types of query that you could build in the APIs could also be made with search, however for very complex searches it may be easier to construct them using a small script and the corresponding API. Additionally, some newer features may be available in the searchd system that have not yet been brought into search. When calling search, it is not necessary to have searchd running; simply make sure that the account running the search program has read access to the configuration file and the index files. The default behaviour is to apply a search for word1 (AND word2 AND word3... as specified) to all fields in all indexes as given in the configuration file. If constructing the equivalent in the API, this would be the equivalent to passing to SetMatchMode, and specifying * as the indexes to query as part of Query. Options There are many options available to search. Firstly, the general options: CONFIGFILE, CONFIGFILE Use the given file as its configuration, just as with indexer. INDEX, INDEX Limit searching to the specified index only; normally search would attempt to search all of the physical indexes listed in sphinx.conf, not any distributed ones. Accept the query from the standard input, rather than the command line. This can be useful for testing purposes whereby you could feed input via pipes and from scripts Options for setting matches: , Changes the matching mode to match any of the words as part of the query (word1 OR word2 OR word3). In the API this would be equivalent to passing to SetMatchMode. , Changes the matching mode to match all of the words as part of the query, and do so in the phrase given (not including punctuation). In the API this would be equivalent to passing to SetMatchMode. , Changes the matching mode to Boolean matching. Note if using Boolean syntax matching on the command line, you may need to escape the symbols (with a backslash) to avoid the shell/command line processor applying them, such as ampersands being escaped on a Unix/Linux system to avoid it forking to the search process, although this can be resolved by using , as below. In the API this would be equivalent to passing to SetMatchMode. , Changes the matching mode to Extended matching. In the API this would be equivalent to passing to SetMatchMode, and it should be noted that use of this mode is being discouraged in favour of Extended2, below. , Changes the matching mode to Extended matching, version 2. In the API this would be equivalent to passing to SetMatchMode, and it should be noted that use of this mode is being recommended in favour of Extended, due to being more efficient and providing other features. <attr><v>, <attr><v> Filters the results such that only documents where the attribute given (attr) matches the value given (v). For example, deleted 0 only matches documents with an attribute called 'deleted' where its value is 0. You can also add multiple filters on the command line, by specifying multiple multiple times, however if you apply a second filter to an attribute it will override the first defined filter. Options for handling the results: <count>, <count> limits the total number of matches back to the number given. If a 'group' is specified, this will be the number of grouped results. This defaults to 20 results if not specified (as do the APIs) <count>, <count> offsets the result list by the number of places set by the count; this would be used for pagination through results, where if you have 20 results per 'page', the second page would begin at offset 20, the third page at offset 40, etc. <attr>, <attr> specifies that results should be grouped together based on the attribute specified. Like the GROUP BY clause in SQL, it will combine all results where the attribute given matches, and returns a set of results where each returned result is the best from each group. Unless otherwise specified, this will be the best match on relevance. <expr>, <expr> instructs that when results are grouped with , the expression given in <expr> shall determine the order of the groups. Note, this does not specify which is the best item within the group, only the order in which the groups themselves shall be returned. <clause>, <clause> specifies that results should be sorted in the order listed in <clause>. This allows you to specify the order you wish results to be presented in, ordering by different columns. For example, you could say "@weight DESC entrytime DESC" to sort entries first by weight (or relevance) and where two or more entries have the same weight, to then sort by the time with the highest time (newest) first. You will usually need to put the items in quotes ( "@weight DESC") or use commas ( @weight,DESC) to avoid the items being treated separately. Additionally, like the regular sorting modes, if (grouping) is being used, this will state how to establish the best match within each group. <expr>, <expr> specifies that the search results should be presented in an order determined by an arithmetic expression, stated in expr. For example: "@weight + ( user_karma + ln(pageviews) )*0.1" (again noting that this will have to be quoted to avoid the shell dealing with the asterisk). Extended sort mode is discussed in more detail under the entry under the Sorting modes section of the manual. specifies that the results should be sorted by descending (i.e. most recent first) date. This requires that there is an attribute in the index that is set as a timestamp. specifies that the results should be sorted by ascending (i.e. oldest first) date. This requires that there is an attribute in the index that is set as a timestamp. specifies that the results should be sorted by timestamp in groups; it will return all of the documents whose timestamp is within the last hour, then sorted within that bracket for relevance. After, it would return the documents from the last day, sorted by relevance, then the last week and then the last month. It is discussed in more detail under the entry under the Sorting modes section of the manual. Other options: , instructs search not to look-up data in your SQL database. Specifically, for debugging with MySQL and search, you can provide it with a query to look up the full article based on the returned document ID. It is explained in more detail under the directive. Author Andrey Aksenoff (shodan@sphinxsearch.com). This manual page is written by Alexey Vinogradov (klirichek@sphinxsearch.com). Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation. On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL. See also indexer 1 , searchd 1 , indextool 1 Sphinx and it's programs are documented fully by the Sphinx reference manual available in /usr/share/doc/sphinxsearch. spelldump 1 Sphinxsearch 2.0.2 spelldump Sphinxsearch tool for extract the contents of a dictionary file. spelldump OPTIONS dictionary affix result locale-name Description Sphinx is a collection of programs that aim to provide high quality fulltext search. spelldump is used to extract the contents of a dictionary file that uses ispell or MySpell format, which can help build word lists for wordforms - all of the possible forms are pre-built for you. The two main parameters are the dictionary's main file and its affix file; usually these are named as [language-prefix].dict and [language-prefix].aff and will be available with most common Linux distributions, as well as various places online. specifies where the dictionary data should be output to, and additionally specifies the locale details you wish to use. Examples of its usage are: spelldump en.dict en.aff spelldump ru.dict ru.aff ru.txt ru_RU.CP1251 spelldump ru.dict ru.aff ru.txt .1251 The results file will contain a list of all the words in the dictionary in alphabetical order, output in the format of a wordforms file, which you can use to customise for your specific circumstances. An example of the result file: zone > zone zoned > zoned zoning > zoning Options [FILE] specifies a file for case conversion details. Author Andrey Aksenoff (shodan@sphinxsearch.com). This manual page is written by Alexey Vinogradov (klirichek@sphinxsearch.com). Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation. On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL. See also indexer 1 , indextool 1 . Sphinx and it's programs are documented fully by the Sphinx reference manual available in /usr/share/doc/sphinxsearch. indextool 1 Sphinxsearch 2.0.2 indextool Sphinxsearch tool dump miscellaneous debug information about the physical index. indextool command options Description Sphinx is a collection of programs that aim to provide high quality fulltext search. indextool is one of the helper tools within the Sphinx package. It is used to dump miscellaneous debug information about the physical index. Apart ghe dumping indextool can perform index verification, hence the indextool name rather than just indexdump. Commands The commands are as follows: FILENAME.sph quickly dumps the provided index header file without touching any other index files or even the configuration file. The report provides a breakdown of all the index settings, in particular the entire attribute and field list. Prior to 0.9.9-rc2, this command was present in CLI search utility. FILENAME.sph dumps the index definition from the given index header file in (almost) compliant sphinx.conf file format. INDEXNAME dumps index header by index name with looking up the header path in the configuration file. INDEXNAME dumps document IDs by index name. It takes the data from attribute (.spa) file and therefore requires to work. INDEXNAME KEYWORD dumps all the hits (occurences) of a given keyword in a given index, with keyword specified as text. INDEXNAME ID dumps all the hits (occurences) of a given keyword in a given index, with keyword specified as internal numeric ID. INDEXNAME filters stdin using HTML stripper settings for a given index, and prints the filtering results to stdout. Note that the settings will be taken from sphinx.conf, and not the index header. INDEXNAME checks the index data files for consistency errors that might be introduced either by bugs in indexer and/or hardware faults. strips the path names from all the file names referenced from the index (stopwords, wordforms, exceptions, etc). This is useful for checking indexes built on another machine with possibly different path layouts. Options The only currently available option applies to all commands and lets you specify the configuration file: CONFIGFILE, CONFIGFILE overrides the built-in config file names. Author Andrey Aksenoff (shodan@sphinxsearch.com). This manual page is written by Alexey Vinogradov (klirichek@sphinxsearch.com). Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation. On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL. See also indexer 1 , searchd 1 , search 1 Sphinx and it's programs are documented fully by the Sphinx reference manual available in /usr/share/doc/sphinxsearch.