Similarity Search Programs
Introduction
This is a package of programs similar to the ones used by ChemMine for similarity searches. Provided are statically linked binaries compiled for use on x86 and x86_64 computers.
System Requirements
- Operating system: x86 Linux, x86_64 Linux
Notes
- These program needs to have each compound assigned to an unique integer identification number that are greater than 0
- Results returned from the program are just a list of identification numbers
- Currently there is no way to delete what has been added to the database with out deleting the entire database
- 64-bit version has only been tested on AMD processors
Running the programs
There are a total of 5 programs provided each requiring its own specific set of parameters. These parameters must all be provided in order
- descriptor_gen (dbdir) (id list file) (sdf dir) (setname) (dbtype) (create)
- load_smi (dbdir) (id list file) (sdf dir) (create)
- descriptor_compare (dbdir) (sdf file) (dbtype) (cutoff) (sort) (set1) (set2)...
- substructure (dbdir) (smiles file) (set1) (set2)
- dbstat (dbdir)
(dbdir) | directory that was created in step 1 |
(id list file) | file with list identification numbers |
(sdf dir) | directory where all sdf file were placed |
(setname) | database name that you want to give to this group of compounds |
(dbtype) | type of database that is to be created this can only be 1 or 2. |
1 for atom pair 2 for atom sequence | |
(create) | run initial setup of database can only be 0 or 1 | 0 to not run initial setup |
1 to run initial setup the first time you create a database you should set this variable to 1 | |
(sdf file) | an sdf file containing the query compound |
(cutoff) | program will only return scores higher than the number provided here |
(sort) | 1 to sort the results 0 for unsorted |
(setx) | set names to search |
(smiles file) | a file with the query smiles string |
Example
In the example subdirectory contains sample data that this example will follow.
The files myset and myset1 contains a list of identification numbers for assigned for the compounds.
The sdf directory contains sdf files that are used.
-
Create a directory for the database
- This can be simply done by using the mkdir command in linux. This example assumes that there is a empty directory named db in the current directory.
-
Creating a database
Initialize the database and create the first compound set with the descriptor_gen and load_smi programs.
- To generate the data for a compound set execute: "./descriptor_gen db/ example/myset example/sdf/ myset 1 1"
This will generate a atom pair for the compound set defined by the file myset and name this set myset - To generate atom sequence data for the compound set execute: "./descriptor_gen db/ example/myset example/sdf/ myset 2 0"
Note that the create parameter given to the program is now set as 0 - To generate the smiles data for the compound set execute: "./load_smi db example/myset example/sdf/ 0"
- To generate the data for a compound set execute: "./descriptor_gen db/ example/myset example/sdf/ myset 1 1"
-
Adding another compound set to the database
This step is similar to the previous steps taken in creating the database. The following commands are executed- ./descriptor_gen db/ example/myset1 example/sdf/ myset1 1 0
- ./descriptor_gen db/ example/myset1 example/sdf/ myset1 2 0
- ./load_smi db example/myset1 example/sdf/ 0
-
Searching the database
Similarity searches are done by executing the descriptor_compare program. To do a similarity search with a compound against set myset execute:- ./descriptor_compare db/ example/sdf/1.sdf 1 0.3 1 myset
- ./descriptor_compare db/ example/sdf/1.sdf 1 0.3 1 myset myset1
- ./descriptor_compare db/ example/sdf/1.sdf 2 0.3 1 myset
- ./substructure db/ example/test.smi myset
download x86_64 program