Man page - gt-fingerprint(1)

Packages contains this manual

Manual

GT-FINGERPRINT

NAME
SYNOPSIS
DESCRIPTION
EXAMPLES
RETURN VALUES
REPORTING BUGS

NAME

gt-fingerprint - Compute MD5 fingerprints for each sequence given in a set of sequence files.

SYNOPSIS

gt fingerprint [option ...] sequence_file [...]

DESCRIPTION

-check [ filename ]

compare all fingerprints contained in the given checklist file with checksums in given sequence_files(s). The comparison is successful, if all fingerprints given in checkfile can be found in the sequence_file(s) in the exact same quantity and vice versa. (default: undefined)

-duplicates [ yes|no ]

show duplicate fingerprints from given sequence_file(s). (default: no)

-extract [ string ]

extract the sequence(s) with the given fingerprint from sequence file(s) and show them on stdout. (default: undefined)

-width [ value ]

set output width for FASTA sequence printing (0 disables formatting) (default: 0)

-o [ filename ]

redirect output to specified file (default: undefined)

-gzip [ yes|no ]

write gzip compressed output file (default: no)

-bzip2 [ yes|no ]

write bzip2 compressed output file (default: no)

-force [ yes|no ]

force writing to output file (default: no)

-help

display help and exit

-version

display version information and exit

If neither option -check nor option -duplicates is used, the fingerprints for all sequences are shown on stdout.

Fingerprint of a sequence is case insensitive. Thus MD5 fingerprint of two identical sequences will be the same even if one is soft-masked.

EXAMPLES

Compute (unified) list of fingerprints:

$ gt fingerprint U89959_ests.fas | sort | uniq > U89959_ests.checklist_uniq

Compare fingerprints:

$ gt fingerprint -check U89959_ests.checklist_uniq U89959_ests.fas
950b7715ab6cc030a8c810a0dba2dd33 only in sequence_file(s)

Make sure a sequence file contains no duplicates (not the case here):

$ gt fingerprint -duplicates U89959_ests.fas
950b7715ab6cc030a8c810a0dba2dd33 2
gt fingerprint: error: duplicates found: 1 out of 200 (0.500%)

Extract sequence with given fingerprint:

$ gt fingerprint -extract 6d3b4b9db4531cda588528f2c69c0a57 U89959_ests.fas
>SQ;8720010
TTTTTTTTTTTTTTTTTCCTGACAAAACCCCAAGACTCAATTTAATCAATCCTCAAATTTACATGATAC
CAACGTAATGGGAGCTTAAAAATA

RETURN VALUES

• 0 everything went fine ( -check : the comparison was successful; -duplicates : no duplicates found)

• 1 an error occurred ( -check : the comparison was not successful; -duplicates : duplicates found)

REPORTING BUGS

Report bugs to https://github.com/genometools/genometools/issues.