WebCSD v1.0.6 FAQs
- How does the similarity search work?
- What are the differences between the Tanimoto and Dice similarity coefficients?
- What should I sketch for a similarity search?
- Why do I get different reduced cell search results in WebCSD compared to ConQuest?
- Why do I only see refcodes beginning with 'A' when I browse the database?
- What are the client-side technical requirements for WebCSD access?
- How can I speed up Java on my computer?
- Why is WebCSD always slow at the beginning of each session?
- My searches are running really slowly - what should I do?
- I have just started using a new version of WebCSD - why are some features behaving strangely or not working?
- How does author name searching work?
- How can I search for more complicated compound names?
- Why does WebCSD run searches in a new window each time?
- Why do my WebCSD applets stop appearing when I already have many WebCSD tabs/windows open?
- Why does WebCSD give a 'Socket is not connected' error every time I try to run a search?
- Why does the Jmol visualiser give an 'access denied' error when I try to view WebCSD structures?
How does the similarity search work?
The similarity calculation in WebCSD is based on molecular fingerprints
that are calculated using the chemical features of the molecule such as atom types,
bond types and bonded paths through the molecule. When a molecule is drawn in the
similarity sketcher, the molecular fingerprint for this molecule is calculated and
then it is compared to pre-calculated fingerprints of all the structures in the CSD.
The fingerprint comparison is performed using either of the Tanimoto or Dice coefficients,
this effectively gives a measure of the similarity between the molecules. Each of the
coefficients will produce a similarity value in the range of 0 to 1, with 0 being completely
dissimilar and 1 being identical. In order to produce a manageable set of similar structures
a cut-off value for the similarity coefficient is used, below which value matches are
discarded (the default for this is 0.7 for Tanimoto and 0.975 for Dice).
N.B. The two types of similarity coefficient are not directly comparable, so calculated
similarity values cannot be compared between the two types in a quantitative fashion.
What are the differences between the Tanimoto and Dice similarity coefficients?
The Tanimoto coefficent is determined by looking at the number of chemical
features that are common to both molecules (the intersection of the data
strings) compared to the number of chemical features that are in either (the
union of the data strings). The Dice coefficient also compares these values but
using a slightly different weighting.
The Tanimoto coefficient is the ratio of the number of features common to both
molecules to the total number of features, i.e.
( A intersect B ) / ( A + B - ( A intersect B ) )
The range is 0 to 1 inclusive.
The Dice coefficient is the number of features in common to both molecules relative
to the average size of the total number of features present, i.e.
( A intersect B ) / 0.5 ( A + B )
The weighting factor comes from the 0.5 in the denominator. The range is 0 to 1.
What should I sketch for a similarity search?
The similarity search is based on a comparison of molecular fingerprints, so
it is important to sketch a full molecule rather than a substructure. It is not crucial,
however, to draw the hydrogens on your molecule because hydrogens are not included
explicitly in the similarity calculation.
Why do I get different reduced cell search results in WebCSD compared to ConQuest?
Firstly, if a particular unit cell is entered for a reduced cell search in either
of ConQuest or WebCSD the search algorithms will not miss any matches which should be hit
for that particular search. The ConQuest search, however, only uses the reduced unit cell
lengths to find matches due to known mathematical instabilities associated with inclusion of
the unit cell angles (Andrews, Bernstein & Pelletier, Acta Cryst, 1980, A36, 248-252).
The new implementation in WebCSD takes into account the cell angles as well by using
a more advanced methodology involving nearly Buerger-reduced cells (Andrews & Bernstein,
Acta Cryst, 1988, A44, 1009-1018). This approach avoids the problems with instabilities and
means that the reduced cell search in WebCSD gives fewer false positive hits.
Why do I only see refcodes beginning with 'A' when I browse the database?
The scrollable list of refcodes in the Browse Database section has been designed such
that it only loads the set of refcodes beginning with one particular letter at any time. This
has been done to avoid over-loading the Javascript menu and also to make scrolling through the
list easier and more useful. As such, when you first enter the Browse Database page, the browser
will be showing only the refcodes starting with the letter 'A'. The browser can be prompted to go
to a particular section of the database by typing letters into the textbox - as you type, the
browser will jump to the most relevant refcode.
What are the client-side technical requirements for WebCSD access?
Supported Browsers
The following browsers are fully supported for WebCSD v1.0:
Apple Mac Users
Please note - we highly recommend the use of Safari on Mac OS X as this currently
offers the best user experience on this platform.
Alternative Browsers
If none of the supported browsers are available, you could use one of the following
alternatives to run WebCSD v1.0 even though they are not formally supported at
this stage. You may notice some limitations when using one of these browsers - please
let us know if you encounter any technical difficulties and we will endeavour to assist you.
Other Requirements
- Java Runtime Environment (JRE) v1.5 or later.
Latest JRE v1.6 highly recommended for optimal performance.
- Your network must allow you to open TCP socket connections to webcsdserver.ccdc.cam.ac.uk on either port 80 or port 8765.
- You must allow pop-ups for the *.ccdc.cam.ac.uk domain.
- You must enable Javascript in your web browser.
- Client-side cookies are used to store personal preferences within WebCSD. They may also
be used to store essential per-session data required for WebCSD access. If you disable cookies,
your preferences and interface settings will not be retained and you may not be able to access
WebCSD.
- You must accept the CCDC digital certificate when prompted to do so by the WebCSD Java applets.
The Java applets have been digitally signed to give them sufficient privileges to connect to the WebCSD
server. Failure to accept the certificate will prevent them from starting your searches. If you reject
the digital certificate, you will still be prompted to accept it next time you visit the site in a new
browser session.
How can I speed up Java on my computer?
WebCSD relies heavily on
Java technology. Java is used to power the chemical sketcher, the 3D visualiser and the results browser. There are four key factors that determine the speed of Java applications:
- The version of Java you are using - Generally speaking, the newer the better.
- The speed of your computer - This determines how quickly the Java Runtime Environment can be started at the beginning of each browser session and how quickly the applet can be initialised on the page.
- The speed of your internet connection - This determines how quickly the applet can be downloaded from the web server.
- The internet browser you use - Some internet browsers work better with Java applications than others. If you are having performance issues with one browser, it's worth trying a different one.
We recommend the use of Java Runtime Environment 6 (the current release version) which can be downloaded
here.
Why is WebCSD always slow at the beginning of each session?
Before a Java application can run, the "Java Runtime Environment" (JRE) must be initialised. This can take quite a few seconds (depending purely on the speed of your client machine and the version of Java you are using). Until the JRE has completely initialised, the page you are trying to use will be inactive and will probably appear empty.
Once the JRE has loaded the page will come to life, the missing components will appear, and you will be able to use WebCSD. The JRE only needs to initialise once per browser session, the first time a Java application is run. The next time Java is used, the application should appear virtually immediately due to the internal caching that automatically takes place within the JRE.
My searches are running really slowly - what should I do?
There are many possible explanations for slow searches. The most common reasons are:
- Slow internet connection
- Slow client PC
- Slow Java performance
- Very busy server
In order to diagnose the underlying cause of this problem, we have added a '
Socket Connection Test' mechanism to WebCSD. This test retrieves the first 100,000 database entries from the CSD via your network connection. Please allow the test to run to completion and retrieve all 100,000 entries. You can then send us an automated performance report by choosing the 'Send Search Statistics Report' option from the 'Help' menu of the result browser applet down the left-hand side. Please enter your name, email address and any other relevant information in the dialog that appears and then click 'Send Report'. The information will then be automatically sent to the CCDC support team for their prompt attention.
The information contained within this report should indicate where the performance bottleneck lies and therefore what needs to be done to resolve it. You may be asked to submit several performance reports in this way either in quick succession or at different times of day to give us a better average. For example, the general level of traffic on the internet varies throughout the day and can skew the results at certain times of day.
I have just started using a new version of WebCSD - why are some features behaving strangely or not working?
Of course it is possible that you have identified a genuine issue in WebCSD, but it is quite common for this to be caused by a web browser failing to notice that a file has changed on the WebCSD server and therefore continuing to use the old cached version.
Before contacting us to report the problem, we recommend that you empty your browser's cache of temporary internet files and try WebCSD again just in case this provides a quick and easy solution.
- On Mozilla Firefox 3.0.*, go to the 'Tools' menu and choose 'Clear Private Data...'
- On Mozilla Firefox 3.5.*, go to the 'Tools' menu and choose 'Clear Recent History...'.
Make sure the 'Cache' checkbox is selected in the 'Details' before clicking 'Clear Now'.
- On Internet Explorer 7, go to the 'Tools' menu and choose 'Delete Browsing History...' and then click on the 'Delete Files' button.
- On Internet Explorer 8, go to the 'Safety' menu and choose 'Delete Browsing History...'
How does author name searching work?
To search on author name, select the 'Author Name' query type and enter the required surname
in the text/numeric 'Query' box. Optionally authors' initials may also be specified, but each
must be followed by a full-stop with no spaces between initials or between initials and surname,
e.g. 'F.H.Allen'. When initials are provided, all must match exactly, e.g. 'F.H.Allen' would not
match 'F.Allen'.
When using the match anywhere option, the query 'Allen' would match names like 'Allenby' and 'Allenford'.
Use the match exact word option to only allow exact name matches.
How can I search for more complicated compound names?
Here are some useful conventions and tips for compound name searching:
-
Standard paranthesis characters can be used in WebCSD
text/numeric searches, so you can search for 'cobalt(ii)'
or 'bicyclo[3.3.1]nonane'.
-
You can use '+' and '-' characters to define
stereochemistry, e.g. '(+-)-Nefopam'.
-
Lower case Greek characters are stored in the text
using their latin alphabet descriptions, e.g. alpha for
α and mu for μ. Upper case Greek characters are
spelt out and prefixed by c, e.g. cdelta for Δ.
-
The names of elements Al, Cs and S are spelt aluminium,
cesium, sulfur.
-
Bridging ligands in polymeric metal coordination complexes
are identified by the bridging indicator μ, with the polymer
identified by the prefix catena, e.g. catena-((μ2-2,5-dihydroxy-p-benzoquinonato)-zinc).
-
Names of hydrates will contain the words hemihydrate,
monohydrate, dihydrate, etc., otherwise, just hydrate if
the multiplier is a non-integer value.
-
If other solvents are present, the name will contain the
word solvate; clathrate is used for solvates which are
clathrated, as in host-guest compounds.
-
Deuterated species will always contain the name characters deuter.
-
Characters which would normally be typeset
as superscripts or subscripts are enclosed within the
characters $ (up) and ! (down) eg.:
'eta$5!-cyclopentadienyl' will match strings including
'η5-cyclopentadienyl'.
Why does WebCSD run searches in a new window each time?
WebCSD is designed to launch each search in a new 'pop-up'. This approach offers two key advantages:
- Your query is retained in the original window/tab so it can easily be modified, saved or run again.
- You can compare multiple search results side-by-side.
You have some control over how your internet browser handles these pop-ups. Most modern tabbed browsers
(including Internet Explorer, Firefox, Safari, Opera and Google Chrome) allow you to specify whether
pop-ups should open in a new window or a new tab by default. We would recommend configuring your browser
to open pop-ups in a new tab as this offers the best user experience in web applications such as WebCSD.
Why do my WebCSD applets stop appearing when I already have many WebCSD tabs/windows open?
Sun's Java Runtime Environment (JRE) applies a default limit on the maximum amount of memory made available
to the Java applets running in your web browser. Depending on your browser and JRE version, this
limit may be shared across all applets running in your browser, even if they are in different windows
or tabs. If you open too many applets at once, you may run out of Java heap memory and be unable to
open any more. If this occurs, you will see an error message like "java.lang.OutOfMemoryError: Java
heap space" in your Java console. If this occurs, please update your JRE to the latest version. If
you are unable to run Java 6 Update 10 or later, please refer to this
article.
Why does WebCSD give a 'Socket is not connected' error every time I try to run a search?
In order to run a search, WebCSD's result browser applet must make a TCP socket connection back
to the CCDC's search server at webcsdserver.ccdc.cam.ac.uk. By default, it attempts to connect on
port 80. However, some networks block direct port 80 access to the internet and force all traffic through
an HTTP web proxy which is not suitable for WebCSD traffic. If port 80 is blocked, the applet will
automatically try to connect on port 8765 instead. If it successfully connects to port 8765, it remembers
to use that port by default for all subsequent searches in that session. Therefore, in order to run
searches on the public internet version of WebCSD, you must ensure that your network allows your PC
to connect to webcsdserver.ccdc.cam.ac.uk on either port 80 or port 8765.
If you want to use a different port to the one automatically selected by the result browser, you
can manually override its selection by going to the 'Help/Settings' menu and choosing a
new port number. Your selection will be saved in a browser cookie for future sessions.
Why does the Jmol visualiser give an 'access denied' error when I try to view WebCSD structures?
If you get an error message similar to:
access denied (java.net.SocketPermission 127.0.0.1:8081 connect,resolve)
at the top of the Jmol display window and no molecule appears, you may need to update your Java
security policy to allow connections to the WebCSD server.
To do this, you will need to edit the java.policy file that your local computer is using
- this will probably be in the lib/security subdirectory of your Java runtime installation.
In the java.policy file, add a line like this:
permission java.net.SocketPermission "http://127.0.0.1:8081";, "connect, resolve";
(or whatever address is used to connect to your WebCSD server) in the grant section.
If this does not work, you can also try adding:
Permission java.security.AllPermission;
in the grant section, but this disables the Java security mechanism and should
ideally be avoided.