CCDC Logo WebCSD
 
 Home Substructure Search Similarity Search Text/Numeric Search Reduced Cell Search Browse CSD News Help

WebCSD v1.0.6 FAQs

  1. How does the similarity search work?
  2. What are the differences between the Tanimoto and Dice similarity coefficients?
  3. What should I sketch for a similarity search?
  4. Why do I get different reduced cell search results in WebCSD compared to ConQuest?
  5. Why do I only see refcodes beginning with 'A' when I browse the database?
  6. What are the client-side technical requirements for WebCSD access?
  7. How can I speed up Java on my computer?
  8. Why is WebCSD always slow at the beginning of each session?
  9. My searches are running really slowly - what should I do?
  10. I have just started using a new version of WebCSD - why are some features behaving strangely or not working?
  11. How does author name searching work?
  12. How can I search for more complicated compound names?
  13. Why does WebCSD run searches in a new window each time?
  14. Why do my WebCSD applets stop appearing when I already have many WebCSD tabs/windows open?
  15. Why does WebCSD give a 'Socket is not connected' error every time I try to run a search?
  16. Why does the Jmol visualiser give an 'access denied' error when I try to view WebCSD structures?

How does the similarity search work?

The similarity calculation in WebCSD is based on molecular fingerprints that are calculated using the chemical features of the molecule such as atom types, bond types and bonded paths through the molecule. When a molecule is drawn in the similarity sketcher, the molecular fingerprint for this molecule is calculated and then it is compared to pre-calculated fingerprints of all the structures in the CSD. The fingerprint comparison is performed using either of the Tanimoto or Dice coefficients, this effectively gives a measure of the similarity between the molecules. Each of the coefficients will produce a similarity value in the range of 0 to 1, with 0 being completely dissimilar and 1 being identical. In order to produce a manageable set of similar structures a cut-off value for the similarity coefficient is used, below which value matches are discarded (the default for this is 0.7 for Tanimoto and 0.975 for Dice).

N.B. The two types of similarity coefficient are not directly comparable, so calculated similarity values cannot be compared between the two types in a quantitative fashion.

What are the differences between the Tanimoto and Dice similarity coefficients?

The Tanimoto coefficent is determined by looking at the number of chemical features that are common to both molecules (the intersection of the data strings) compared to the number of chemical features that are in either (the union of the data strings). The Dice coefficient also compares these values but using a slightly different weighting.

The Tanimoto coefficient is the ratio of the number of features common to both molecules to the total number of features, i.e.

( A intersect B ) / ( A + B - ( A intersect B ) )

The range is 0 to 1 inclusive.

The Dice coefficient is the number of features in common to both molecules relative to the average size of the total number of features present, i.e.

( A intersect B ) / 0.5 ( A + B )

The weighting factor comes from the 0.5 in the denominator. The range is 0 to 1.

What should I sketch for a similarity search?

The similarity search is based on a comparison of molecular fingerprints, so it is important to sketch a full molecule rather than a substructure. It is not crucial, however, to draw the hydrogens on your molecule because hydrogens are not included explicitly in the similarity calculation.

Why do I get different reduced cell search results in WebCSD compared to ConQuest?

Firstly, if a particular unit cell is entered for a reduced cell search in either of ConQuest or WebCSD the search algorithms will not miss any matches which should be hit for that particular search. The ConQuest search, however, only uses the reduced unit cell lengths to find matches due to known mathematical instabilities associated with inclusion of the unit cell angles (Andrews, Bernstein & Pelletier, Acta Cryst, 1980, A36, 248-252). The new implementation in WebCSD takes into account the cell angles as well by using a more advanced methodology involving nearly Buerger-reduced cells (Andrews & Bernstein, Acta Cryst, 1988, A44, 1009-1018). This approach avoids the problems with instabilities and means that the reduced cell search in WebCSD gives fewer false positive hits.

Why do I only see refcodes beginning with 'A' when I browse the database?

The scrollable list of refcodes in the Browse Database section has been designed such that it only loads the set of refcodes beginning with one particular letter at any time. This has been done to avoid over-loading the Javascript menu and also to make scrolling through the list easier and more useful. As such, when you first enter the Browse Database page, the browser will be showing only the refcodes starting with the letter 'A'. The browser can be prompted to go to a particular section of the database by typing letters into the textbox - as you type, the browser will jump to the most relevant refcode.

What are the client-side technical requirements for WebCSD access?

Supported Browsers

The following browsers are fully supported for WebCSD v1.0:

Apple Mac Users

Please note - we highly recommend the use of Safari on Mac OS X as this currently offers the best user experience on this platform.

Alternative Browsers

If none of the supported browsers are available, you could use one of the following alternatives to run WebCSD v1.0 even though they are not formally supported at this stage. You may notice some limitations when using one of these browsers - please let us know if you encounter any technical difficulties and we will endeavour to assist you.

Other Requirements

How can I speed up Java on my computer?

WebCSD relies heavily on Java technology. Java is used to power the chemical sketcher, the 3D visualiser and the results browser. There are four key factors that determine the speed of Java applications:
  1. The version of Java you are using - Generally speaking, the newer the better.
  2. The speed of your computer - This determines how quickly the Java Runtime Environment can be started at the beginning of each browser session and how quickly the applet can be initialised on the page.
  3. The speed of your internet connection - This determines how quickly the applet can be downloaded from the web server.
  4. The internet browser you use - Some internet browsers work better with Java applications than others. If you are having performance issues with one browser, it's worth trying a different one.
We recommend the use of Java Runtime Environment 6 (the current release version) which can be downloaded here.

Why is WebCSD always slow at the beginning of each session?

Before a Java application can run, the "Java Runtime Environment" (JRE) must be initialised. This can take quite a few seconds (depending purely on the speed of your client machine and the version of Java you are using). Until the JRE has completely initialised, the page you are trying to use will be inactive and will probably appear empty.

Once the JRE has loaded the page will come to life, the missing components will appear, and you will be able to use WebCSD. The JRE only needs to initialise once per browser session, the first time a Java application is run. The next time Java is used, the application should appear virtually immediately due to the internal caching that automatically takes place within the JRE.

My searches are running really slowly - what should I do?

There are many possible explanations for slow searches. The most common reasons are: In order to diagnose the underlying cause of this problem, we have added a 'Socket Connection Test' mechanism to WebCSD. This test retrieves the first 100,000 database entries from the CSD via your network connection. Please allow the test to run to completion and retrieve all 100,000 entries. You can then send us an automated performance report by choosing the 'Send Search Statistics Report' option from the 'Help' menu of the result browser applet down the left-hand side. Please enter your name, email address and any other relevant information in the dialog that appears and then click 'Send Report'. The information will then be automatically sent to the CCDC support team for their prompt attention.

The information contained within this report should indicate where the performance bottleneck lies and therefore what needs to be done to resolve it. You may be asked to submit several performance reports in this way either in quick succession or at different times of day to give us a better average. For example, the general level of traffic on the internet varies throughout the day and can skew the results at certain times of day.

I have just started using a new version of WebCSD - why are some features behaving strangely or not working?

Of course it is possible that you have identified a genuine issue in WebCSD, but it is quite common for this to be caused by a web browser failing to notice that a file has changed on the WebCSD server and therefore continuing to use the old cached version. Before contacting us to report the problem, we recommend that you empty your browser's cache of temporary internet files and try WebCSD again just in case this provides a quick and easy solution.

How does author name searching work?

To search on author name, select the 'Author Name' query type and enter the required surname in the text/numeric 'Query' box. Optionally authors' initials may also be specified, but each must be followed by a full-stop with no spaces between initials or between initials and surname, e.g. 'F.H.Allen'. When initials are provided, all must match exactly, e.g. 'F.H.Allen' would not match 'F.Allen'.

When using the match anywhere option, the query 'Allen' would match names like 'Allenby' and 'Allenford'. Use the match exact word option to only allow exact name matches.

How can I search for more complicated compound names?

Here are some useful conventions and tips for compound name searching:

Why does WebCSD run searches in a new window each time?

WebCSD is designed to launch each search in a new 'pop-up'. This approach offers two key advantages:

You have some control over how your internet browser handles these pop-ups. Most modern tabbed browsers (including Internet Explorer, Firefox, Safari, Opera and Google Chrome) allow you to specify whether pop-ups should open in a new window or a new tab by default. We would recommend configuring your browser to open pop-ups in a new tab as this offers the best user experience in web applications such as WebCSD.

Why do my WebCSD applets stop appearing when I already have many WebCSD tabs/windows open?

Sun's Java Runtime Environment (JRE) applies a default limit on the maximum amount of memory made available to the Java applets running in your web browser. Depending on your browser and JRE version, this limit may be shared across all applets running in your browser, even if they are in different windows or tabs. If you open too many applets at once, you may run out of Java heap memory and be unable to open any more. If this occurs, you will see an error message like "java.lang.OutOfMemoryError: Java heap space" in your Java console. If this occurs, please update your JRE to the latest version. If you are unable to run Java 6 Update 10 or later, please refer to this article.

Why does WebCSD give a 'Socket is not connected' error every time I try to run a search?

In order to run a search, WebCSD's result browser applet must make a TCP socket connection back to the CCDC's search server at webcsdserver.ccdc.cam.ac.uk. By default, it attempts to connect on port 80. However, some networks block direct port 80 access to the internet and force all traffic through an HTTP web proxy which is not suitable for WebCSD traffic. If port 80 is blocked, the applet will automatically try to connect on port 8765 instead. If it successfully connects to port 8765, it remembers to use that port by default for all subsequent searches in that session. Therefore, in order to run searches on the public internet version of WebCSD, you must ensure that your network allows your PC to connect to webcsdserver.ccdc.cam.ac.uk on either port 80 or port 8765.

If you want to use a different port to the one automatically selected by the result browser, you can manually override its selection by going to the 'Help/Settings' menu and choosing a new port number. Your selection will be saved in a browser cookie for future sessions.

Why does the Jmol visualiser give an 'access denied' error when I try to view WebCSD structures?

If you get an error message similar to:

access denied (java.net.SocketPermission 127.0.0.1:8081 connect,resolve)

at the top of the Jmol display window and no molecule appears, you may need to update your Java security policy to allow connections to the WebCSD server.

To do this, you will need to edit the java.policy file that your local computer is using - this will probably be in the lib/security subdirectory of your Java runtime installation. In the java.policy file, add a line like this:

permission java.net.SocketPermission "http://127.0.0.1:8081";, "connect, resolve";

(or whatever address is used to connect to your WebCSD server) in the grant section. If this does not work, you can also try adding:

Permission java.security.AllPermission;

in the grant section, but this disables the Java security mechanism and should ideally be avoided.


Served by '192.168.0.14'. © Cambridge Crystallographic Data Centre 2006-2010

RSS WebCSD RSS Feed  Valid XHTML 1.0!  Valid CSS!