- How does the similarity search work?
- What are the differences between the Tanimoto and Dice similarity coefficients?
- What are the strengths/weaknesses of similarity searching in WebCSD?
- What should I sketch for a similarity search?
- Why do I get different reduced cell search results in WebCSD compared to ConQuest?
- Why do I only see refcodes beginning with 'A' when I browse the database?
- What are the client-side technical requirements for WebCSD access?
- How can I speed up Java on my computer?
- Why is WebCSD always slow at the beginning of each session?
- My searches are running really slowly - what should I do?
- I have just started using a new version of WebCSD - why are some features behaving strangely or not working?
- How does author name searching work?
- How can I search for more complicated compound names?
- Why does WebCSD run searches in a new window each time?
- Why do my WebCSD applets stop appearing when I already have many WebCSD tabs/windows open?
- Why does WebCSD give a 'Socket is not connected' error every time I try to run a search?
- Why does the Jmol visualiser give an 'access denied' error when I try to view WebCSD structures?
- What are CSD X-Press and Structures Pending?
- How do I reference WebCSD?
- What are Retracted CSD entries?
- What is the 'teaching subset' of the CSD?
- What are user accounts for and how do I get one?
- Are there any differences between substructure searches in WebCSD and ConQuest?
How does the similarity search work?
N.B. The two types of similarity coefficient are not directly comparable, so calculated similarity values cannot be compared between the two types in a quantitative fashion.
What are the differences between the Tanimoto and Dice similarity coefficients?
The Tanimoto coefficient is the ratio of the number of features common to both molecules to the total number of features, i.e.
( A intersect B ) / ( A + B - ( A intersect B ) )The range is 0 to 1 inclusive.
The Dice coefficient is the number of features in common to both molecules relative to the average size of the total number of features present, i.e.
( A intersect B ) / 0.5 ( A + B )The weighting factor comes from the 0.5 in the denominator. The range is 0 to 1.
What are the strengths/weaknesses of similarity searching in WebCSD?
The first issue is that although the bond types are compared, cyclicity is not explicitly taken into account within the fingerprints. This means that cyclohexane will be indistinguishable from hexane in a similarity search. Molecules that contain fewer atoms will also be less well defined, and therefore are more prone to low similarity scores. Finally, no information is stored about chemically related elements, such as transition metals, this means that closely related metal complexes, for example, may not be listed with high similarity coefficients.
For further information about the similarity search calculation, see the following open access publication: Thomas et al., 2010, J. Appl. Cryst., 43, 362-366.
What should I sketch for a similarity search?
Why do I get different reduced cell search results in WebCSD compared to ConQuest?
Why do I only see refcodes beginning with 'A' when I browse the database?
What are the client-side technical requirements for WebCSD access?
Supported BrowsersThe following browsers are fully supported for WebCSD v1.0:
Apple Mac UsersWe recommend the use of Safari on Mac OS X as this generally offers the best user experience on this platform. Please note - we no longer offer formal support for Mac OS X 10.4 ("Tiger").
Alternative BrowsersIf none of the supported browsers are available, you could use one of the following alternatives to run WebCSD v1.0 even though they are not formally supported at this stage. You may notice some limitations when using one of these browsers - please let us know if you encounter any technical difficulties and we will endeavour to assist you.
- Java Runtime Environment (JRE) v1.5 or later. Latest JRE v1.6 highly recommended for optimal performance.
- Your network must allow you to open TCP socket connections to webcsdserver.ccdc.cam.ac.uk on either port 80 or port 8765.
- You must allow pop-ups for the *.ccdc.cam.ac.uk domain.
- Client-side cookies are used to store personal preferences within WebCSD. They may also be used to store essential per-session data required for WebCSD access. If you disable cookies, your preferences and interface settings will not be retained and you may not be able to access WebCSD.
- You must accept the CCDC digital certificate when prompted to do so by the WebCSD Java applets. The Java applets have been digitally signed to give them sufficient privileges to connect to the WebCSD server. Failure to accept the certificate will prevent them from starting your searches. If you reject the digital certificate, you will still be prompted to accept it next time you visit the site in a new browser session.
How can I speed up Java on my computer?
- The version of Java you are using - Generally speaking, the newer the better.
- The speed of your computer - This determines how quickly the Java Runtime Environment can be started at the beginning of each browser session and how quickly the applet can be initialised on the page.
- The speed of your internet connection - This determines how quickly the applet can be downloaded from the web server.
- The internet browser you use - Some internet browsers work better with Java applications than others. If you are having performance issues with one browser, it's worth trying a different one.
Why is WebCSD always slow at the beginning of each session?
Once the JRE has loaded the page will come to life, the missing components will appear, and you will be able to use WebCSD. The JRE only needs to initialise once per browser session, the first time a Java application is run. The next time Java is used, the application should appear virtually immediately due to the internal caching that automatically takes place within the JRE.
My searches are running really slowly - what should I do?
- Slow internet connection
- Slow client PC
- Slow Java performance
- Very busy server
The information contained within this report should indicate where the performance bottleneck lies and therefore what needs to be done to resolve it. You may be asked to submit several performance reports in this way either in quick succession or at different times of day to give us a better average. For example, the general level of traffic on the internet varies throughout the day and can skew the results at certain times of day.
I have just started using a new version of WebCSD - why are some features behaving strangely or not working?
- On Mozilla Firefox 3.0.*, go to the 'Tools' menu and choose 'Clear Private Data...'
- On Mozilla Firefox 3.5.*, go to the 'Tools' menu and choose 'Clear Recent History...'. Make sure the 'Cache' checkbox is selected in the 'Details' before clicking 'Clear Now'.
- On Internet Explorer 7, go to the 'Tools' menu and choose 'Delete Browsing History...' and then click on the 'Delete Files' button.
- On Internet Explorer 8, go to the 'Safety' menu and choose 'Delete Browsing History...'
How does author name searching work?
To search on author name, select the 'Author Name' query type and enter the required surname in the text/numeric 'Query' box. Optionally authors' initials may also be specified, but each must be followed by a full-stop with no spaces between initials or between initials and surname, e.g. 'F.H.Allen'. When initials are provided, all must match exactly, e.g. 'F.H.Allen' would not match 'F.Allen'.
When using the match anywhere option, the query 'Allen' would match names like 'Allenby' and 'Allenford'. Use the match exact word option to only allow exact name matches.
How can I search for more complicated compound names?
Here are some useful conventions and tips for compound name searching:
- Standard paranthesis characters can be used in WebCSD text/numeric searches, so you can search for 'cobalt(ii)' or 'bicyclo[3.3.1]nonane'.
- You can use '+' and '-' characters to define stereochemistry, e.g. '(+-)-Nefopam'.
- Lower case Greek characters are stored in the text using their latin alphabet descriptions, e.g. alpha for α and mu for μ. Upper case Greek characters are spelt out and prefixed by c, e.g. cdelta for Δ.
- The names of elements Al, Cs and S are spelt aluminium, cesium, sulfur.
- Bridging ligands in polymeric metal coordination complexes are identified by the bridging indicator μ, with the polymer identified by the prefix catena, e.g. catena-((μ2-2,5-dihydroxy-p-benzoquinonato)-zinc).
- Names of hydrates will contain the words hemihydrate, monohydrate, dihydrate, etc., otherwise, just hydrate if the multiplier is a non-integer value.
- If other solvents are present, the name will contain the word solvate; clathrate is used for solvates which are clathrated, as in host-guest compounds.
- Deuterated species will always contain the name characters deuter.
- Characters which would normally be typeset as superscripts or subscripts are enclosed within the characters $ (up) and ! (down) eg.: 'eta$5!-cyclopentadienyl' will match strings including 'η5-cyclopentadienyl'.
Why does WebCSD run searches in a new window each time?
WebCSD is designed to launch each search in a new 'pop-up'. This approach offers two key advantages:
- Your query is retained in the original window/tab so it can easily be modified, saved or run again.
- You can compare multiple search results side-by-side.
You have some control over how your internet browser handles these pop-ups. Most modern tabbed browsers (including Internet Explorer, Firefox, Safari, Opera and Google Chrome) allow you to specify whether pop-ups should open in a new window or a new tab by default. We would recommend configuring your browser to open pop-ups in a new tab as this offers the best user experience in web applications such as WebCSD.
Why do my WebCSD applets stop appearing when I already have many WebCSD tabs/windows open?
Sun's Java Runtime Environment (JRE) applies a default limit on the maximum amount of memory made available to the Java applets running in your web browser. Depending on your browser and JRE version, this limit may be shared across all applets running in your browser, even if they are in different windows or tabs. If you open too many applets at once, you may run out of Java heap memory and be unable to open any more. If this occurs, you will see an error message like "java.lang.OutOfMemoryError: Java heap space" in your Java console. If this occurs, please update your JRE to the latest version. If you are unable to run Java 6 Update 10 or later, please refer to this article.
Why does WebCSD give a 'Socket is not connected' error every time I try to run a search?
In order to run a search, WebCSD's result browser applet must make a TCP socket connection back to the CCDC's search server at webcsdserver.ccdc.cam.ac.uk. By default, it attempts to connect on port 80. However, some networks block direct port 80 access to the internet and force all traffic through an HTTP web proxy which is not suitable for WebCSD traffic. If port 80 is blocked, the applet will automatically try to connect on port 8765 instead. If it successfully connects to port 8765, it remembers to use that port by default for all subsequent searches in that session. Therefore, in order to run searches on the public internet version of WebCSD, you must ensure that your network allows your PC to connect to webcsdserver.ccdc.cam.ac.uk on either port 80 or port 8765.
If you want to use a different port to the one automatically selected by the result browser, you can manually override its selection by going to the 'Help/Settings' menu and choosing a new port number. Your selection will be saved in a browser cookie for future sessions.
Why does the Jmol visualiser give an 'access denied' error when I try to view WebCSD structures?
If you get an error message similar to:
at the top of the Jmol display window and no molecule appears, you may need to update your Java security policy to allow connections to the WebCSD server.
To do this, you will need to edit the java.policy file that your local computer is using - this will probably be in the lib/security subdirectory of your Java runtime installation. In the java.policy file, add a line like this:
(or whatever address is used to connect to your WebCSD server) in the grant section. If this does not work, you can also try adding:
in the grant section, but this disables the Java security mechanism and should ideally be avoided.
What are CSD X-Press and Structures Pending?
As the processing and curation of CSD structures takes a finite and not insignificant period of time, the CCDC has decided to take advantage of the new Web-based architecture in WebCSD and start releasing structures to the public before they are fully curated. These structures have been automatically processed using our specialist in-house software to ensure a certain level of quality, but may not have had any manual input. As they have not been fully processed by a CCDC editor yet it is likely that some will contain errors and some of the entries won't contain fields that are added during the curation process, such as 'recrystallisation solvent' and 'bioactivity'. New structures will be added in batches on a regular basis as they are received and these uncurated structures will appear in WebCSD as a separate database designated "CSD X-Press". The intention is that the early access to these structures will be beneficial to users in spite of the possible errors in the "Structures Pending", especially in the handling of disorder and diagram generation.
More about Structures Pending:
- CSD X-Press: The structures that are pending curation are kept in a separate database within the WebCSD architecture named "CSD X-Press". This means that it is simple to perform searches or extract results based on only the fully curated CSD, only the structures pending, or both sets of structures using the checkboxes provided in the Settings tab of WebCSD.
- Refcode Format: The reference code for a regular CSD structure has the format of six letters followed by an optional two digits (e.g. AABHTZ or AACRUB01). For structures pending, the temporary refcodes assigned will always end in '00' to indicate that the structure has not yet been fully curated. Please bear in mind that these refcodes are temporary and the code will either be changed completely or the '00' will be removed.
- Citing a Structure Pending: If you would like to refer to a
CSD X-Press entry within a scientific publication, please report the CCDC reference
number (e.g. CCDC 747743) rather than the temporary refcode. For example use one
of the following styles:
- For published structures, write in the body text "(CCDC 747743)", then cite the original paper in your references section.
- Or, for private communications use a reference like so: "S. Parsons, C. Grant, R. Winpenny, R. Gould & P. Wood (2004). Private communication to CSD, CCDC 248052".
- Reliability Score: This score indicates the level of reliability
assigned automatically to a structure based on the curation status of the entry
and the likelihood of complications in the automatic processing. The reliability
score does not reflect the quality of the crystallography/science and is purely
based on the difficulty of processing the particular entry.
- 4 stars ( ) are given to all fully curated entries in the main CSD. This rating represents a wide-ranging set of professionally-edited structures containing molecules with a broad level of complexity.
- 3 stars ( ) will be given to CSD X-Press entries which encountered a low number of complications during automatic processing and typically represent entries with simple chemistry/crystallography.
- 2 stars ( ) will be given to CSD X-Press entries for which a moderate number of automatic processing problems were discovered, normally representing entries with more complicated chemistry/crystallography.
- 1 star () will be given to CSD X-Press entries for which a high number of automatic processing problems were discovered, normally representing larger entries with complex chemistry/crystallography.
- 2D Diagrams: If there is a match in the CSD based on structural topology, the chemical diagram for this match will be used as a template - the majority of 2D diagrams are derived using this method. Diagram generation for any new structural topologies is automated using Marvin (a ChemAxon package); entries for which acceptable diagrams cannot be automatically generated are simply shown with no diagram. The automatically generated diagrams do not necessarily reflect the exact 3D geometry of the structure.
- Compound Name: This field will be populated if anything has been supplied in the author's original deposited CIF, or if a name can be generated automatically using ACD/Name Batch (an ACD/Labs product) based on the chemical connectivity. Clearly any automatically generated compound names will be subject to possible errors, especially in the case of disordered structures or highly complicated connectivities.
- Disorder: Any disorder will be identified and processed automatically, but it is likely that some structures will require manual checking and editing to fully characterise the disorder.
- Feedback: If you have any questions or comments relating to this new functionality, please e-mail the CSD X-Press Team at firstname.lastname@example.org.
How do I reference WebCSD?
WebCSD: the online portal to the Cambridge Structural Database
I. R. Thomas, I. J. Bruno, J. C. Cole, C. F. Macrae, E. Pidcock and P. A. Wood,
J. Appl. Cryst., 43, 362-366, 2010
What are Retracted CSD entries?
You may or may not be aware that evidence was discovered in 2010 proving that a substantial series of crystal structures published in Acta Crystallograhica Section E were based on falsified data. These structures, primarily published in 2007, originated from research groups at the Jinggangshan University in China. The editors of the journal, along with Ton Spek (Utrecht University), identified the fraudulent structures and the papers have been retracted as described in this Editorial article. In order to accurately represent the situation, we have decided to flag each relevant entry as "Retracted" - all data for these structures have been removed, but the journal references remain in place. See this Statement by Dr. Colin Groom, Executive Director of the CCDC for further information on the matter.
What is the 'teaching subset' of the CSD?
The teaching subset of the Cambridge Structural Database comprises 500 structures chosen specifically for their educational value. The subset is freely available via the WebCSD interface. For further information see:
Teaching Three-Dimensional Structural Chemistry Using Crystal Structure Databases
1. An Interactive Web-Accessible Teaching Subset of the Cambridge Structural Database
G. M. Battle, F. H. Allen and G. M. Ferrence
J. Chem. Educ., 87, 809-812, 2010.
What are user accounts for and how do I get one?
The optional WebCSD database security mechanism can be used to protect access to WebCSD data on a per-database basis. When enabled, the WebCSD server administrator can restrict access to a predefined list of privileged user groups for each individual database.
There are two ways to determine from your Settings that the security mechanism is currently blocking your access to a specific database:
- You cannot select the database in question because it is 'Denied'.
- The database in question is not listed in the database selector (which means the security mechanism is in stealth mode and denied databases are hidden from non-privileged users).
In order to access a protected database, you must log in to a user account which is a member of a group with permission to access that database. If you do not yet have a user account on the WebCSD server in question, you must first request a new account from the WebCSD server administrator. Once you have a user account, you must then ask your administrator to add your account to an appropriate group which has permissions to access the database in question.
You can contact your server administrator via the Support Request form.
NOTE - If the database security mechanism is not enabled on your WebCSD server, or none of the databases you are interested in are protected, you do not require a user account.
Are there any differences between substructure searches in WebCSD and ConQuest?
Although the substructure search engine behind WebCSD is entirely new and does not work in the same way as ConQuest, the results of substructures searches using these two programs are generally identical.
It is important to note though that there is a subtle difference between the two search programs - a WebCSD substructure search currently requires that the fragments drawn are within the same connectivity. ConQuest on the other hand by default will allow a user to define fragments in unconnected moieties (e.g. co-crystals, solvates or salts) and even allows one to define contacts between these non-covalently-bonded fragments. Identical substructure search behaviour between the two systems can be achieved by using the option in the ConQuest sketcher to request "All Atoms in Same Molecule" under the "Atoms" menu.