CCDC Logo WebCSD
 
 Home Substructure Search Similarity Search Text/Numeric Search Reduced Cell Search Browse Settings News Help

WebCSD v1.1.1 FAQs

  1. How does the similarity search work?
  2. What are the differences between the Tanimoto and Dice similarity coefficients?
  3. What are the strengths/weaknesses of similarity searching in WebCSD?
  4. What should I sketch for a similarity search?
  5. Why do I get different reduced cell search results in WebCSD compared to ConQuest?
  6. Why do I only see refcodes beginning with 'A' when I browse the database?
  7. What are the client-side technical requirements for WebCSD access?
  8. How can I speed up Java on my computer?
  9. Why is WebCSD always slow at the beginning of each session?
  10. My searches are running really slowly - what should I do?
  11. I have just started using a new version of WebCSD - why are some features behaving strangely or not working?
  12. How does author name searching work?
  13. How can I search for more complicated compound names?
  14. Why does WebCSD run searches in a new window each time?
  15. Why do my WebCSD applets stop appearing when I already have many WebCSD tabs/windows open?
  16. Why does WebCSD give a 'Socket is not connected' error every time I try to run a search?
  17. Why does the Jmol visualiser give an 'access denied' error when I try to view WebCSD structures?
  18. What are CSD X-Press and Structures Pending?
  19. How do I reference WebCSD?
  20. What are Retracted CSD entries?
  21. What is the 'teaching subset' of the CSD?
  22. What are user accounts for and how do I get one?
  23. Are there any differences between substructure searches in WebCSD and ConQuest?

How does the similarity search work?

The similarity calculation in WebCSD is based on molecular fingerprints that are calculated using the chemical features of the molecule such as atom types, bond types and bonded paths through the molecule. When a molecule is drawn in the similarity sketcher, the molecular fingerprint for this molecule is calculated and then it is compared to pre-calculated fingerprints of all the structures in the CSD. The fingerprint comparison is performed using either of the Tanimoto or Dice coefficients, this effectively gives a measure of the similarity between the molecules. Each of the coefficients will produce a similarity value in the range of 0 to 1, with 0 being completely dissimilar and 1 being identical. In order to produce a manageable set of similar structures a cut-off value for the similarity coefficient is used, below which value matches are discarded (the default for this is 0.7 for Tanimoto and 0.975 for Dice).

N.B. The two types of similarity coefficient are not directly comparable, so calculated similarity values cannot be compared between the two types in a quantitative fashion.

What are the differences between the Tanimoto and Dice similarity coefficients?

The Tanimoto coefficent is determined by looking at the number of chemical features that are common to both molecules (the intersection of the data strings) compared to the number of chemical features that are in either (the union of the data strings). The Dice coefficient also compares these values but using a slightly different weighting.

The Tanimoto coefficient is the ratio of the number of features common to both molecules to the total number of features, i.e.

( A intersect B ) / ( A + B - ( A intersect B ) )

The range is 0 to 1 inclusive.

The Dice coefficient is the number of features in common to both molecules relative to the average size of the total number of features present, i.e.

( A intersect B ) / 0.5 ( A + B )

The weighting factor comes from the 0.5 in the denominator. The range is 0 to 1.

What are the strengths/weaknesses of similarity searching in WebCSD?

With all fingerprint-based methods of similarity searching there are certain strengths and weaknesses inherent in the fingerprint definitions. The fingerprints used by WebCSD for similarity searching are created using atom types, bond types and bonded paths through the molecules. This definition for the fingerprints means that the search will tend to find matches that contain closely related scaffolds. There are, however, a number of weaknesses associated with the fingerprints and similarity calculations as they are implemented at the moment.

The first issue is that although the bond types are compared, cyclicity is not explicitly taken into account within the fingerprints. This means that cyclohexane will be indistinguishable from hexane in a similarity search. Molecules that contain fewer atoms will also be less well defined, and therefore are more prone to low similarity scores. Finally, no information is stored about chemically related elements, such as transition metals, this means that closely related metal complexes, for example, may not be listed with high similarity coefficients.

For further information about the similarity search calculation, see the following open access publication: Thomas et al., 2010, J. Appl. Cryst., 43, 362-366.

What should I sketch for a similarity search?

The similarity search is based on a comparison of molecular fingerprints, so it is important to sketch a full molecule rather than a substructure. It is not crucial, however, to draw the hydrogens on your molecule because hydrogens are not included explicitly in the similarity calculation.

Why do I get different reduced cell search results in WebCSD compared to ConQuest?

Firstly, if a particular unit cell is entered for a reduced cell search in either of ConQuest or WebCSD the search algorithms will not miss any matches which should be hit for that particular search. The ConQuest search, however, only uses the reduced unit cell lengths to find matches due to known mathematical instabilities associated with inclusion of the unit cell angles (Andrews, Bernstein & Pelletier, Acta Cryst, 1980, A36, 248-252). The new implementation in WebCSD takes into account the cell angles as well by using a more advanced methodology involving nearly Buerger-reduced cells (Andrews & Bernstein, Acta Cryst, 1988, A44, 1009-1018). This approach avoids the problems with instabilities and means that the reduced cell search in WebCSD gives fewer false positive hits.

Why do I only see refcodes beginning with 'A' when I browse the database?

The scrollable list of refcodes in the Browse Database section has been designed such that it only loads the set of refcodes beginning with one particular letter at any time. This has been done to avoid over-loading the Javascript menu and also to make scrolling through the list easier and more useful. As such, when you first enter the Browse Database page, the browser will be showing only the refcodes starting with the letter 'A'. The browser can be prompted to go to a particular section of the database by typing letters into the textbox - as you type, the browser will jump to the most relevant refcode.

What are the client-side technical requirements for WebCSD access?

Supported Browsers

The following browsers are fully supported for WebCSD v1.0:

Apple Mac Users

We recommend the use of Safari on Mac OS X as this generally offers the best user experience on this platform. Please note - we no longer offer formal support for Mac OS X 10.4 ("Tiger").

Alternative Browsers

If none of the supported browsers are available, you could use one of the following alternatives to run WebCSD v1.0 even though they are not formally supported at this stage. You may notice some limitations when using one of these browsers - please let us know if you encounter any technical difficulties and we will endeavour to assist you.

Other Requirements

How can I speed up Java on my computer?

WebCSD relies heavily on Java technology. Java is used to power the chemical sketcher, the 3D visualiser and the results browser. There are four key factors that determine the speed of Java applications:
  1. The version of Java you are using - Generally speaking, the newer the better.
  2. The speed of your computer - This determines how quickly the Java Runtime Environment can be started at the beginning of each browser session and how quickly the applet can be initialised on the page.
  3. The speed of your internet connection - This determines how quickly the applet can be downloaded from the web server.
  4. The internet browser you use - Some internet browsers work better with Java applications than others. If you are having performance issues with one browser, it's worth trying a different one.
We recommend the use of Java Runtime Environment 6 (the current release version) which can be downloaded here.

Why is WebCSD always slow at the beginning of each session?

Before a Java application can run, the "Java Runtime Environment" (JRE) must be initialised. This can take quite a few seconds (depending purely on the speed of your client machine and the version of Java you are using). Until the JRE has completely initialised, the page you are trying to use will be inactive and will probably appear empty.

Once the JRE has loaded the page will come to life, the missing components will appear, and you will be able to use WebCSD. The JRE only needs to initialise once per browser session, the first time a Java application is run. The next time Java is used, the application should appear virtually immediately due to the internal caching that automatically takes place within the JRE.

My searches are running really slowly - what should I do?

There are many possible explanations for slow searches. The most common reasons are: In order to diagnose the underlying cause of this problem, we have added a 'Socket Connection Test' mechanism to WebCSD. This test retrieves the first 100,000 database entries from the CSD via your network connection. Please allow the test to run to completion and retrieve all 100,000 entries. You can then send us an automated performance report by choosing the 'Send Search Statistics Report' option from the 'Help' menu of the result browser applet down the left-hand side. Please enter your name, email address and any other relevant information in the dialog that appears and then click 'Send Report'. The information will then be automatically sent to the CCDC support team for their prompt attention.

The information contained within this report should indicate where the performance bottleneck lies and therefore what needs to be done to resolve it. You may be asked to submit several performance reports in this way either in quick succession or at different times of day to give us a better average. For example, the general level of traffic on the internet varies throughout the day and can skew the results at certain times of day.

I have just started using a new version of WebCSD - why are some features behaving strangely or not working?

Of course it is possible that you have identified a genuine issue in WebCSD, but it is quite common for this to be caused by a web browser failing to notice that a file has changed on the WebCSD server and therefore continuing to use the old cached version. Before contacting us to report the problem, we recommend that you empty your browser's cache of temporary internet files and try WebCSD again just in case this provides a quick and easy solution.

How does author name searching work?

To search on author name, select the 'Author Name' query type and enter the required surname in the text/numeric 'Query' box. Optionally authors' initials may also be specified, but each must be followed by a full-stop with no spaces between initials or between initials and surname, e.g. 'F.H.Allen'. When initials are provided, all must match exactly, e.g. 'F.H.Allen' would not match 'F.Allen'.

When using the match anywhere option, the query 'Allen' would match names like 'Allenby' and 'Allenford'. Use the match exact word option to only allow exact name matches.

How can I search for more complicated compound names?

Here are some useful conventions and tips for compound name searching:

Why does WebCSD run searches in a new window each time?

WebCSD is designed to launch each search in a new 'pop-up'. This approach offers two key advantages:

You have some control over how your internet browser handles these pop-ups. Most modern tabbed browsers (including Internet Explorer, Firefox, Safari, Opera and Google Chrome) allow you to specify whether pop-ups should open in a new window or a new tab by default. We would recommend configuring your browser to open pop-ups in a new tab as this offers the best user experience in web applications such as WebCSD.

Why do my WebCSD applets stop appearing when I already have many WebCSD tabs/windows open?

Sun's Java Runtime Environment (JRE) applies a default limit on the maximum amount of memory made available to the Java applets running in your web browser. Depending on your browser and JRE version, this limit may be shared across all applets running in your browser, even if they are in different windows or tabs. If you open too many applets at once, you may run out of Java heap memory and be unable to open any more. If this occurs, you will see an error message like "java.lang.OutOfMemoryError: Java heap space" in your Java console. If this occurs, please update your JRE to the latest version. If you are unable to run Java 6 Update 10 or later, please refer to this article.

Why does WebCSD give a 'Socket is not connected' error every time I try to run a search?

In order to run a search, WebCSD's result browser applet must make a TCP socket connection back to the CCDC's search server at webcsdserver.ccdc.cam.ac.uk. By default, it attempts to connect on port 80. However, some networks block direct port 80 access to the internet and force all traffic through an HTTP web proxy which is not suitable for WebCSD traffic. If port 80 is blocked, the applet will automatically try to connect on port 8765 instead. If it successfully connects to port 8765, it remembers to use that port by default for all subsequent searches in that session. Therefore, in order to run searches on the public internet version of WebCSD, you must ensure that your network allows your PC to connect to webcsdserver.ccdc.cam.ac.uk on either port 80 or port 8765.

If you want to use a different port to the one automatically selected by the result browser, you can manually override its selection by going to the 'Help/Settings' menu and choosing a new port number. Your selection will be saved in a browser cookie for future sessions.

Why does the Jmol visualiser give an 'access denied' error when I try to view WebCSD structures?

If you get an error message similar to:

access denied (java.net.SocketPermission 127.0.0.1:8081 connect,resolve)

at the top of the Jmol display window and no molecule appears, you may need to update your Java security policy to allow connections to the WebCSD server.

To do this, you will need to edit the java.policy file that your local computer is using - this will probably be in the lib/security subdirectory of your Java runtime installation. In the java.policy file, add a line like this:

permission java.net.SocketPermission "http://127.0.0.1:8081";, "connect, resolve";

(or whatever address is used to connect to your WebCSD server) in the grant section. If this does not work, you can also try adding:

Permission java.security.AllPermission;

in the grant section, but this disables the Java security mechanism and should ideally be avoided.

What are CSD X-Press and Structures Pending?

As the processing and curation of CSD structures takes a finite and not insignificant period of time, the CCDC has decided to take advantage of the new Web-based architecture in WebCSD and start releasing structures to the public before they are fully curated. These structures have been automatically processed using our specialist in-house software to ensure a certain level of quality, but may not have had any manual input. As they have not been fully processed by a CCDC editor yet it is likely that some will contain errors and some of the entries won't contain fields that are added during the curation process, such as 'recrystallisation solvent' and 'bioactivity'. New structures will be added in batches on a regular basis as they are received and these uncurated structures will appear in WebCSD as a separate database designated "CSD X-Press". The intention is that the early access to these structures will be beneficial to users in spite of the possible errors in the "Structures Pending", especially in the handling of disorder and diagram generation.

More about Structures Pending:

How do I reference WebCSD?

WebCSD: the online portal to the Cambridge Structural Database
I. R. Thomas, I. J. Bruno, J. C. Cole, C. F. Macrae, E. Pidcock and P. A. Wood,
J. Appl. Cryst., 43, 362-366, 2010
DOI: 10.1107/S0021889810000452

What are Retracted CSD entries?

You may or may not be aware that evidence was discovered in 2010 proving that a substantial series of crystal structures published in Acta Crystallograhica Section E were based on falsified data. These structures, primarily published in 2007, originated from research groups at the Jinggangshan University in China. The editors of the journal, along with Ton Spek (Utrecht University), identified the fraudulent structures and the papers have been retracted as described in this Editorial article. In order to accurately represent the situation, we have decided to flag each relevant entry as "Retracted" - all data for these structures have been removed, but the journal references remain in place. See this Statement by Dr. Colin Groom, Executive Director of the CCDC for further information on the matter.

What is the 'teaching subset' of the CSD?

The teaching subset of the Cambridge Structural Database comprises 500 structures chosen specifically for their educational value. The subset is freely available via the WebCSD interface. For further information see:

Teaching Three-Dimensional Structural Chemistry Using Crystal Structure Databases
1. An Interactive Web-Accessible Teaching Subset of the Cambridge Structural Database
G. M. Battle, F. H. Allen and G. M. Ferrence
J. Chem. Educ., 87, 809-812, 2010.
DOI: 10.1021/ed100256k

What are user accounts for and how do I get one?

The optional WebCSD database security mechanism can be used to protect access to WebCSD data on a per-database basis. When enabled, the WebCSD server administrator can restrict access to a predefined list of privileged user groups for each individual database.

There are two ways to determine from your Settings that the security mechanism is currently blocking your access to a specific database:

  1. You cannot select the database in question because it is 'Denied'.
  2. The database in question is not listed in the database selector (which means the security mechanism is in stealth mode and denied databases are hidden from non-privileged users).

In order to access a protected database, you must log in to a user account which is a member of a group with permission to access that database. If you do not yet have a user account on the WebCSD server in question, you must first request a new account from the WebCSD server administrator. Once you have a user account, you must then ask your administrator to add your account to an appropriate group which has permissions to access the database in question.

You can contact your server administrator via the Support Request form.

NOTE - If the database security mechanism is not enabled on your WebCSD server, or none of the databases you are interested in are protected, you do not require a user account.

Are there any differences between substructure searches in WebCSD and ConQuest?

Although the substructure search engine behind WebCSD is entirely new and does not work in the same way as ConQuest, the results of substructures searches using these two programs are generally identical.

It is important to note though that there is a subtle difference between the two search programs - a WebCSD substructure search currently requires that the fragments drawn are within the same connectivity. ConQuest on the other hand by default will allow a user to define fragments in unconnected moieties (e.g. co-crystals, solvates or salts) and even allows one to define contacts between these non-covalently-bonded fragments. Identical substructure search behaviour between the two systems can be achieved by using the option in the ConQuest sketcher to request "All Atoms in Same Molecule" under the "Atoms" menu.


Served by '192.168.0.14'. © Cambridge Crystallographic Data Centre 2006-2014

RSS WebCSD RSS Feed  Valid XHTML 1.0!  Valid CSS!