The guide for Chemiverse.

  • Home
  • Support  
  • Help
This function creates QSAR Model.
The properties of new compounds can be predicted.

Screen Layout


1. Loading compound
A file (* .sd, * .sdf) containing compound information is called up to construct a data set of the QSAR model.
2. Enter the QSAR model information
Enter information about the QSAR model.
3. Selection of molecular descriptor
Select the physical and chemical molecular descriptors of compounds and determine the appropriate molecular presenter according to the selection method.
4. Enter learning method options
Depending on the learning method chosen to create the QSAR model, enter the required values.

Tutorial 1 - Compound Loading

화면 불러오기
  • 1. Select 'Browse' to load the compound.
  • 2. The compound file (* .sd, * .sdf) should contain the experimental data of the structure of the compound and the properties to be predicted.

Tutorial 2 - Enter Model Information

모델 정보 입력
  • 1. Enter 'Projetct'.

    It is easy to manage models such as the same endpoint and the same data set.

  • 2. Enter 'Name'.
  • 3. Enter 'Property'.

    The compound file should have an empirical value with the same name as the 'Property' entered.

  • 4. Enter 'Unit'.

    'Unit' does not use special characters, but uses English name.
    Ex) ℃ → Degree Celsius

  • 5. Enter 'Endpoint'.
  • 6. Enter 'Comment'.

    Enter brief information about the model to be created.

Tutorial 3 - Set up statistical analysis methods

통계 분석 방법 설정
  • 1. Set the statistical analysis method.

    Select Regression or Classification Analysis, and select the appropriate learning technique.

  • 1-1. Enter additional values depending on the learning method. (Tutorial 3-1)
  • 2. Select the cross validation method.

    We use three methods to perform cross validation to determine whether an overfitting occurs.
    Data sets are randomly grouped into training and verification groups.
    However, in the case of classification analysis, the ratio of classification given as experiment value is kept constant.

튜토리얼3-1 - 학습 방법에 따른 설정값 입력

옵션값 입력
  • 1. Support Vector Regression

    1-1. μ-SVR
        0 < ε,    0 < μ < 1.0

    1-2. ε-SVR
        0 < ε,    0 < C

  • 2. Artificial Neural Network

    2-1. Traditional back-propagation algorithm
        Hidden Layers: Enter the number of hidden nodes needed for each hidden layer.
        Ex) 3,2,1 → It means that the number of each hidden node in 3 hidden layers is 3, 2, 1.
        0 < Max Steps, 0 < Learning rate < 1.0

    2-2. Traditional back-propagation algorithm
        Hidden Layers 및 Max Steps : Same as above.

  • 3. Logistic Regression

        0 < Cut off < 1.0

  • 4. Support Vector Classification

    4-1. μ-SVC
        0 < μ < 1.0

    4-2. C-SVR
        0 < C

Tutorial 3-2 - Choosing Cross-Validation

교차 검증 선택
  • 1. Select the cross-validation method based on the number of compounds used to generate the QSAR model.

      The table below is a model generated by extracting the reference compound at random and using the MLR learning technique.
      The mean and standard deviation are the results obtained by repeating a total of 100 times using the same number of compounds and compounds.
      Each sample extracts arbitrary values from 100 models.
     ※ References: Ramamurthi Narayanan et al., Bioorg. Med. Chem. 2005, 13, 3017–3028

Tutorial 4 - Selecting molecular descriptors

분자 표현자 선택
  • 1. Select the molecular descriptors required to generate the QSAR model.

       When the number of molecular descriptors is large (1/5 or more of the number of compounds), the molecular descriptors are selected according to the following procedure.
        1. If the value of the same molecular descriptor is less than 70%.
        2. The correlation coefficient between the molecular descriptor is less than 0.8.
        3. Molecular descriptors with a high correlation with experimental values (less than 1/5 of the number of compounds).

     ※ Even using the above method, the accuracy of model may be low.
    It is recommended that the number of molecule descriptors is 1/10 or less of the number of compounds.

     ※ Reference: Mahyar Nirouei et al., Indian J. Biochem. Biophys. 2012, 49, 202-210., Sahebjamee Hassan et al., Iran. J. Chem. Chem. Eng. 2013, 32, 19-29.

  • 2. Select the molecular descriptor selection method.

    Select the appropriate molecular descriptors according to your chosen algorithm.

Tutorial 5 - My QSAR Model

내 모델

1. Identify your QSAR model.
2. If the time box is not displayed in the orange box, the model is being prepared.
3. When creation is complete, click the model name in the red box to confirm the model you created.

Tutorial 6 - the QSAR model information 1

모델 정보

1. The name of the QSAR model.
2. Display the name and unit of the property to be predicted through the QSAR model.
3. Endpoint of QSAR model.
4. Indicates the learning method and analysis type of the QSAR model.
5. The number of compounds used in the QSAR model generation, the number of molecular presenter.
6. Indicate the determination coefficient or accuracy of the QSAR model.
The verification is performed by the n-fold cross validation method and the average value is displayed.
7. Displays information about the learning techniques used when creating the QSAR model.
Depending on the type of learning technique, the content will vary.
  Ex)SVM: The number of support vectors, ANN: The number of hidden layers and neurons
8. Shows brief information about the QSAR model.
9. The internal correlation coefficient of the molecular descriptor used to generate the QSAR model.


모델 정보

1. Applicability domain chart.
If the test set is outside 95% of the AD of the training set, the predictability of the test compound is low.
  ※ Reference literature: Alexander Tropsha et al., QSAR Combi. Sci. 2003, 22, 69-77.
2. If the time box is not displayed in the orange box, the model is being prepared.
3. When creation is complete, click the model name in the red box to confirm the model you created.

TUTORIAL 8 - Predicting New Compounds

모델 정보

1. Import SD file.
Use the user's SD file to make predictions.
2. My subset.
We use the subset provided by Chemiverse to make predictions.
3. Search Similarity.
Compounds similar to these compounds are searched from within the Chemiverse database and predictions are made.
Based on similarity search results, it may take a lot of time.