Similarity Search Programs

Introduction

This is a package of programs similar to the ones used by ChemMine for similarity searches. Provided are statically linked binaries compiled for use on x86 and x86_64 computers.

System Requirements

Notes

Running the programs

There are a total of 5 programs provided each requiring its own specific set of parameters. These parameters must all be provided in order

(dbdir) directory that was created in step 1
(id list file) file with list identification numbers
(sdf dir) directory where all sdf file were placed
(setname) database name that you want to give to this group of compounds
(dbtype) type of database that is to be created this can only be 1 or 2.
1 for atom pair 2 for atom sequence
(create) run initial setup of database can only be 0 or 1
0 to not run initial setup
1 to run initial setup the first time you create a database you should set this variable to 1
(sdf file)an sdf file containing the query compound
(cutoff) program will only return scores higher than the number provided here
(sort) 1 to sort the results 0 for unsorted
(setx) set names to search
(smiles file) a file with the query smiles string

Example

In the example subdirectory contains sample data that this example will follow.
The files myset and myset1 contains a list of identification numbers for assigned for the compounds.
The sdf directory contains sdf files that are used.

  1. Create a directory for the database

    • This can be simply done by using the mkdir command in linux. This example assumes that there is a empty directory named db in the current directory.
  2. Creating a database

    Initialize the database and create the first compound set with the descriptor_gen and load_smi programs.
    • To generate the data for a compound set execute: "./descriptor_gen db/ example/myset example/sdf/ myset 1 1"
      This will generate a atom pair for the compound set defined by the file myset and name this set myset
    • To generate atom sequence data for the compound set execute: "./descriptor_gen db/ example/myset example/sdf/ myset 2 0"
      Note that the create parameter given to the program is now set as 0
    • To generate the smiles data for the compound set execute: "./load_smi db example/myset example/sdf/ 0"
  3. Adding another compound set to the database

    This step is similar to the previous steps taken in creating the database. The following commands are executed
    • ./descriptor_gen db/ example/myset1 example/sdf/ myset1 1 0
    • ./descriptor_gen db/ example/myset1 example/sdf/ myset1 2 0
    • ./load_smi db example/myset1 example/sdf/ 0
    Notice that for this step the create parameter are all set to 0 since the database has already been created
  4. Searching the database

    Similarity searches are done by executing the descriptor_compare program.
    To do a similarity search with a compound against set myset execute:
    • ./descriptor_compare db/ example/sdf/1.sdf 1 0.3 1 myset
    Same thing but search both myset and myset1
    • ./descriptor_compare db/ example/sdf/1.sdf 1 0.3 1 myset myset1
    Use atom sequence instead of atom pairs
    • ./descriptor_compare db/ example/sdf/1.sdf 2 0.3 1 myset
    Substructure searches with the substructure program.
    • ./substructure db/ example/test.smi myset
download x86 program
download x86_64 program

Brought to you by UCR::IIGB::CEPCEB