Data on more than 22,000 cancer cases are now available for research by bona fide clinical and medical researchers. This repository is the first major output of the Clinical e-Science Framework (CLEF), an e-Science project funded by the Medical Research Council (MRC). Sophisticated security systems, also developed by CLEF, ensure secure and ethical access to the databank. Dr Catalina Hallett will demonstrate the query of the new database at the e-Science All Hands meeting in Nottingham on 20 September.

Patient records contain a wealth of information that could be very useful to medical research. To make this information accessible to researchers, however, it must be extracted from what is often written text and presented in such a way that it can be compared with data from scientific and other databases. CLEF has developed techniques to capture relevant information from text automatically and enter it into a database. The project has also implemented stringent access control, authentication and secure transmission protocols using sophisticated encryption standards to protect against accidental disclosures.

Professor Alan Rector, CLEF's director, said: "The CLEF repository is optimised to treat electronic healthcare records as an interactive knowledge source for academic researchers and clinicians to help them access the latest medical information. Once fully deployed, it will lead to previously unthinkable, rapid advances in healthcare research by enabling researchers to analyse data stored in a wide range of geographically-spread databases, on-line."

Professor David Ingram's team at University College, London built the repository using a new method for importing and structuring data so that users can do population queries over longitudinal data sets. The CLEF repository supports the large-scale analysis of patient records in a Grid environment. It can handle complex queries, whilst retaining the critical semantic, structural and medico-legal integrity of the data.

The process, developed in part by Professor Alan Rector's team at the University of Manchester, structures the source data in multiple steps enabling users to put complex clinical questions to the repository. First data is structured in a longitudinal format, then by clinical context and finally by the actual type of data. Previously, the retrieval of similarly complex data would have required time-consuming manual search and data analysis. Using the work of Professor Rob Gaizauskas' team from the University of Sheffield, the CLEF system is able to extract key medical information from clinical records that are in a narrative format, for example medical letters, discharge summaries, radiology reports, etc.

A new, generic WYSIWYM ("What you see is what you mean") interface that was developed by Professor Donia Scott's team at The Open University enables users to pose complex clinical queries in natural language and receive answers in plain English text or simple tables and graphs. Users no longer need to learn "computer-speak" to communicate with an electronic database.

CLEF's future work includes extending its database and refining its use of knowledge resources to help both patients and professionals to access the right information and interpret scientific data. The project's aim is to provide user-friendly and secure tools to improve clinical and research practices, teaching methods and care management processes.

