Man page - omindex(1)

Packages contains this manual

Manual

OMINDEX

NAME
SYNOPSIS
DESCRIPTION
OPTIONS

NAME

omindex - Index static website data via the filesystem

SYNOPSIS

omindex [ OPTIONS ] --db DATABASE [ BASEDIR ] DIRECTORY

DESCRIPTION

omindex - Index static website data via the filesystem

DIRECTORY is the directory to start indexing from.

BASEDIR is the directory corresponding to URL (default: DIRECTORY).

OPTIONS

-d , --duplicates = ARG

set duplicate handling: ARG can be ’ignore’ or ’replace’ (default: replace)

-p , --no-delete

skip the deletion of documents corresponding to deleted files ( --preserve-nonduplicates is a deprecated alias for --no-delete )

-e , --empty-docs = ARG

how to handle documents we extract no text from: ARG can be index, warn (issue a diagnostic and index), or skip. (default: warn)

-D , --db = DATABASE

path to database to use

-U , --url = URL

base url BASEDIR corresponds to (default: /)

-M , --mime-type = EXT :TYPE

assume any file with extension EXT has MIME Content-Type TYPE, instead of using libmagic (empty TYPE removes any existing mapping for EXT; other special TYPE values: ’ignore’ and ’skip’)

-G , --mime-type-match = GLOB :TYPE

assume any file with leaf name matching shell wildcard pattern GLOB has MIME Content-Type TYPE (special TYPE values: ’ignore’ and ’skip’)

-F , --filter = M[ ,[T][,C]]:CMD

process files with MIME Content-Type M using command CMD, which produces output (on stdout or in a temporary file) with format T (Content-Type or file extension; currently txt (default), html or svg) in character encoding C (default: UTF-8). E.g. -Fapplication /octet-stream:’strings -n8 ’ or -Ftext /x-foo,,utf-16:’foo2utf16 %f %t’

--read-filters = FILE

bulk-load --filter arguments from FILE, which should contain one such argument per line (e.g. text/x-bar:bar2txt --utf8 ). Lines starting with # are treated as comments and ignored.

-l , --depth-limit = LIMIT

set recursion limit (0 = unlimited)

-f , --follow

follow symbolic links

-i , --ignore-exclusions

ignore meta robots tags and similar exclusions

-S , --spelling

index data for spelling correction

-m , --max-size = N[SUFFIX]

maximum size of file to index (in bytes or with a suffix of ’K’/’k’, ’M’/’m’, ’G’/’g’) (default: unlimited)

--sample = SOURCE

what to use for the stored sample of text for HTML documents - SOURCE can be ’body’ or ’description’ (default: ’body’)

-E , --sample-size = SIZE

maximum size for the document text sample (supports the same formats as --max-size ). (default: 512)

-T , --title-size = SIZE

maximum size for the document title (supports the same formats as --max-size ). (default: 128)

-R , --retry-failed

retry files which omindex failed to extract text from on a previous run

--opendir-sleep = SECS

sleep for SECS seconds before opening each directory - sleeping for 2 seconds seems to reliably work around problems with indexing files on Microsoft DFS shares.

-C , --track-ctime

track each file’s ctime so we can detect changes to ownership or permissions.

--date-terms

ignored for forward compatibility with Omega 1.5.x.

--no-date-terms

don’t index D, M and Y prefixed terms to support date range filtering using terms (we now recommend using a value slot for this instead).

-v , --verbose

show more information about what is happening

--overwrite

create the database anew (the default is to update if the database already exists)

-s , --stemmer = LANG

set the stemming language (default: english). Possible values: arabic armenian basque catalan danish dutch earlyenglish english finnish french german german2 hungarian indonesian irish italian kraaij_pohlmann lithuanian lovins nepali norwegian porter portuguese romanian russian spanish swedish tamil turkish (pass ’none’ to disable stemming)

-h , --help

display this help and exit

-V , --version

output version information and exit

Please report bugs at: https://xapian.org/bugs