Graphics for Chemical Structures - ACS Publications - American


Graphics for Chemical Structures - ACS Publications - American...

3 downloads 113 Views 1000KB Size

Chapter 10

A Data Base System That Relies Heavily on Graphics G. W. A. Milne

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

Information Technology Branch, Division of Cancer Treatment, National Cancer Institute, Bethesda, MD 20892

The National Cancer Institute operates several large, numeric databases and is taking increasing advantage of graphics as a means of presenting large amounts of data. The technology and philosophy underlying such work is described in this paper. Since 1955, the National Cancer Institute has supported a program in which large numbers of chemicals are tested i n an attempt to i d e n t i f y compounds which possess a c t i v i t y against human cancer. The program has had some success i n that of the approximately 40 anti-cancer drugs that are currently commercially available i n the U.S., almost a l l were discovered by or developed i n this program. These 40 drugs emerged however from a starting group of about half a m i l l i o n compounds, and i t i s now clear that a r e l a t i v e l y large number of compounds must be examined i n order to find one useful agent· Large databases are therefore to be expected i n this program and i n fact, the NCI Drug Information System (DIS), which carries a l l the data associated with this e f f o r t , i s currently storing about 4 b i l l i o n bytes of data. With such large f i l e s , even legitimate and correct queries can often produce prodigious amounts of information and accordingly, a major task for the DIS has been to design methods for presentation of data that provide concise reports. Graphics are very valuable i n this connection and the use of graphics i n the DIS forms the subject of t h i s paper. The NCI Drug Information System The operation of the drug screening program and the DIS i s i l lustrated i n Figure 1. The a c q u i s i t i o n step (#2 i n Figure 1) represents the f i r s t DIS operation for a compound. From a variety of sources, including l i t e r a t u r e surveillance and l i a i s o n with industry and academia, the program i d e n t i f i e s each year some 50,000 structures judged to be p o t e n t i a l l y of interest i n connection with cancer chemotherapy.

This chapter not subject to US. copyright Published 1987 American Chemical Society

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

I

Working Groups

Qroup

BlolofT D«U

oroup

Group

Suppliers

y

Datakaa.

DEC-10 Comput

Overall Schematic of the DIS.

DIS Database

10

%

Â

Figure 1.

* ·

Data

3

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

104

GRAPHICS FOR CHEMICAL STRUCTURES

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

These structures are a l l entered into the Preregistry database of the DIS. Using c r i t e r i a such as structural uniqueness and estimated probability of a c t i v i t y , computer programs i d e n t i f y about 20,000 of the better candidates from this input and the NCI r e quests samples of these compounds for testing. Approximately half the compounds requested are received. As they are acquired, they are assigned a permanent "NSC Number" and their chemistry records are moved from the Preregistry to the Chemistry database. The physical samples are labeled with their NSC Numbers, i n barcoded form, and transferred to a storage facili t y , where they are logged i n . This second contractor weighs the material and creates an Inventory record f o r that sample. A Shipping record i s also begun at this point, r e f l e c t i n g the fact that the compound was shipped on given dates from the Supplier to the Acquisitions contractor and from the Acquisitions to the Storage contractor. These new records are used, respectively, to update the Inventory database and the Shipping History database. For preliminary testing, which i s against P388 leukemia i n mice, the DIS controls the flow of compounds from the Storage contractor to the various screening laboratories. As a screener*s load/capacity r a t i o drops, the DIS automatically directs more compounds to be sent to that screener. The capacity of a screener can be adjusted by NCI s t a f f so as to r e f l e c t the screener s contractual obligation. The storage contractor receives such shipping requests from the DIS and f i l l s them on a d a i l y basis. Each year, there are some 10,000 such "automatic shipments", and i n addition, some 2,000 i n d i v i d u a l shipments of compounds destined f o r secondary testing are ordered by NCI s t a f f . 1

The screening laboratories use a f u l l - s c r e e n edit program operating on a Hewlett-Packard HP-2645A terminal to c o l l e c t the data from completed screening experiments (#6 i n Figure 1). Once all the data have been entered, they are written i n condensed form onto a tape cassette i n the terminal. At regular i n t e r v a l s , t y p i c a l l y d a i l y , the terminal i s logged onto the NIH computer f a c i l i t y and the contents of the tape are downloaded into the NIH IBM 370 computers. There, the downloaded f i l e s are used as input to a program (#8 i n Figure 1) which examines a l l new data for i n t e r n a l consistency and freedom from l o g i c a l errors, and then calculates the f i n a l test results from the raw data. Errors that can be corrected on the spot are resolved; other errors that are detected are passed back to the screener. When they logon next, the calculated data and the errors are presented f o r resolution before more data entry begins. When data have been f i n a l l y validated i n this way, they are written to a staging area to await the next master f i l e update. These updates are carried out every two weeks and trigger an update of the online searchable f i l e s i n the DIS. Such a biweekly update i s r e f l e c t e d i n the content of the searchable biology database (#9 i n Figure 1). DIS Databases There are 24 linked databases i n the DIS.

Many of these are quite

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

10.

MILNE

Data Base System That Relies on Graphics

105

small, and a few are not d i r e c t l y accessible to users. The major f i l e s are a l l i n t e r a c t i v e l y searchable, and these are shown i n Figure 2. Each of the databases contains some number of f i e l d s , and each f i e l d i s i d e n t i f i e d by means of a " f i e l d mnemonic", which i s usually a f o u r - l e t t e r code, such as ADDR f o r address, or MOLF for molecular formula. There are 360 d i s t i n c t f i e l d s i n the DIS; 232 of these are searchable and a l l of them can be displayed on command. Every one of these f i e l d mnemonics i s unique; i t is therefore unnecessary for a user to remember which database i s being addressed, because the DIS can recognize the f i e l d mnemonic and search the appropriate database. PIS Computers The DIS runs on computers of the NIH Computer Center which are shown schematically i n Figure 3. Most of the DIS code and data i s resident upon a DEC System 10 computer (#2 i n Figure 3). This i s linked to an IBM 3091 system (#3) which, i n turn has a HewlettPackard 2680 high-speed (0.7 sees/page) laser printer (#5) along with i t s c o n t r o l l e r , an HP-3000 minicomputer (#4), configured as a peripheral device. This design i s somewhat complicated but i s mandated by various constraints that are beyond NCI's control. Delivery of graphics from the DEC-10 to the laser printer i s generally handled by macros that are b u i l t into the DIS and the operation i s s u f f i c i e n t l y smooth that the printer can be regarded as though i t were a DEC-10 peripheral. Laser printers of this sort have f u l l graphics c a p a b i l i t y with moderately high resolution and it i s on this printer that a l l the graphics from the DIS are printed. Other output devices are accessible from the DIS and these include Calcomp and Zeta p l o t t e r s , as well as an Apollo workstation, which i s used to support molecular modeling. Graphics from the PIS This Section contains a description of three d i s t i n c t areas where the PIS makes extensive use of the graphics printing c a p a b i l i t y that has been described. Printing of Letters. Each year, NCI generates as many as 20,000 l e t t e r s to suppliers of compounds. Such a volume of correspondence must be generated automatically, and programs to handle this have been i n s t a l l e d i n the PIS. The quality and style of the l e t t e r s i s important because NCI maintains a collégial relationship with i t s suppliers. The l e t t e r s therefore are personalized to some extent, routinely c i t e prior correspondence, carry structure diagrams and are often written i n languages other than English. The o v e r a l l function of the l e t t e r generating program i s shown i n Figure 4. Once a decision has been made to t r y to acquire a compound, i t i s determined whether this i s a f i r s t order f o r the compound or a refill. In either case, the prior records f o r the material are reviewed and data that are to be cited i n the new l e t t e r are retrieved. Then the language for the l e t t e r i s selected. Where appropriate, the PIS uses French, German, Spanish or Japanese. Otherwise, English i s used. Once the language i s chosen, a font selection must be made, the correct "canned" text retrieved, name,

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

GRAPHICS FOR CHEMICAL STRUCTURES

Miscellaneous (0.1%) Order (0.4%) Preregistry (0.8%)

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

amecodes (0.2%)

Figure 2 .

DEC-10 Triple KL CPU

10.2

Major DIS Databases.

mu 3033 System

kb

NIH Computer Center

9.6 kb

NCI

4 1.2

kb

HP-3000 Computer 1000

kb

HP-2680 Laser Printer

Figure 3 .

Computers Used by the DIS.

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

10.

MILNE

107

Data Base System That Relies on Graphics

DIS

Acquire

—J=^

Refill

New Order Old File

New File

11

ι I

1 Old File

New File

rm

iiMi

Language Font

7=

Letter Printer Figure 4.

Text Salutation Address Dates

The DIS Letter Generator

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

108

GRAPHC IS FOR CHEMC IAL STRUCTURES

address, and salutation added and dates that are to be cited i n the text must be inserted. Then the l e t t e r i s sent to the p r i n t e r , usually as one of a large batch of l e t t e r s . The printer prints a letterhead on the f i r s t page, then prints the date and the name and address of the r e c i p i e n t . It then switches language and font, as necessary, and prints the l e t t e r and f i n i s h e s the f i r s t page by p r i n t i n g a signature block using the normal pica font. A signature - which i s just another graphic, but one that i s stored under password control - may or may not be included i n the signature block. The second page i s an "attachment" which c a r r i e s d e t a i l s of a chemical. No letterhead i s printed, but the chemical structure and other i d e n t i f y i n g data are printed. Here a pica font i s used, irrespective of what was used on the f i r s t page. Thus the l e t t e r s with t h e i r respective attachment sheets are printed i n order and can be mailed d i r e c t l y . The r e c i p i e n t ' s address i s always printed i n English and positioned so that a window envelope may be used for mailing. Such l e t t e r s are produced routinely by the DIS at the rate of a few hundred per week. A l e t t e r and attachment to a Spanish supplier i s shown i n Figure 5 and Figure 6 shows a l e t t e r to a Japanese investigator. Chemical Structures. The DIS i s required to p r i n t hundreds of chemical structure diagrams each week. These diagrams are used i n a l l manner of reports and a basic requirement of the system i s that the diagrams be of a high graphical q u a l i t y . To meet this demand, the DIS proceeds as follows. Structures are entered into the database at the time the chemical i s being considered for a c q u i s i t i o n (#2 i n Figure 1 ) . Structure entry i s c a r r i e d out using a microcomputer which i s on-line to the DEC-10. The flow of data during this process i s shown i n Figure 7. A program on the microcomputer supports entry of the structure as a vector diagram which can be modified by the user u n t i l i t i s chemically and esthetically satisfactory. Then the vector set i s uploaded to the host where i t i s transformed to a standard connection table. The connection table i s used to perform a number of checking functions. Once the structure has been accepted as correct, the connection table i s passed to numerous DIS programs which use i t to generate search keys, structure diagrams which can be typed on a non-graphics terminal and diagrams which can be drawn on a CRT. Meanwhile the vector set i s passed forward unchanged and i s stored as a part of the compound's permanent record. It i s used whenever the structure of the compound i s subsequently designated for p r i n t i n g . These three d i f f e r e n t types of structure output are shown i n Figure 8, from which i t might be noted that the q u a l i t y of the diagrams i s roughly proportional to the cost of the output device that i s used. The vector diagram printed by a laser p r i n t e r i s c l e a r l y the optimum for high q u a l i t y structure diagrams and i t i s used throughout the DIS when structure output i s to be printed. Representation of B i o l o g i c a l Data. B i o l o g i c a l data i s recorded by the DIS i n minute d e t a i l . Even a simple preliminary test with a compound i n cancer-bearing mice leads to several hundred l i n e s of data and by the time an active compound comes into consideration

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

10.

Data Base System That Relies on Graphics

MILNE

109

20 august. 1906. Dr. Benjaaln Rodrlgues Chief. Natural Products Dept. Institute of Organic Chealstry CSIC Juan de l a Cierva, 3 Madrid f, Spain

Bstlaado Dr. Rodrlgues: Incluiaos e l archive qulalco properado para une de sus actives potential*s contra e l cancer soaotldas 07/29/M. Bste archlvo contlene l a lnforaaclen quiaica y e l neaero NSC correspondlente para sus auestras. Bste neaero NSC les ldentiflcari su coapuesto en teda future correspondencla y apareceri en les dates de seleecUn. Reportes de les dates de sel*cclin se envlarin a ustedes tan pronto sean disponibles. Gracias por su intérêt y partlcipaclén en e l programs do Qulaoterapla del Cancer. S i tuvleran alguna pregunta • coaentarlos sobre este en e l future sirvase coaunlcarse con nosotros. Sinceraaente, Matthew Suffness, Ph.D., Chief Natural Products Branch National Cancer Institute Landow Building, Rooa 5C-09 Bethesda, Maryland, 20692 U.S.A.

Figure 5.

Letter to a Spanish supplier.

Continued on next page.

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

110

GRAPHICS FOR CHEMICAL STRUCTURES

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

ISC 61085*

Ο

DRUG SYNTHESIS AND CHEMISTRY BRANCH, DTP. DCT, NCI NSC NUMBER LIST Dr. B«nJ*mln Rodrlgues Chief. Naturel Products Dept. Institute of Organic Chemistry CSIC Juan de l a Cierva. 3 Madrid t. Spain Transmittal Letter or Collection Date: 07/29/06 Supplier Compound ID: Capltatln, 7-Deacetyl NSC Number: §10*5*-0 Molecular Poraula: C22H260*

Figure

5.—Continued.

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

10.

Data Base System That Relies on Graphics

MILNE

111

Public Health Service

DEPARTMENT OF HEALTH & HUMAN SERVICES

National Institutes of Health National Cancer Institute Bethesda, Maryland 20892 TLX:908111

20 August, 1986. Dr. Yoshiyasu S h i t o r i MECT Corporation M i t s u i Bldg. 5F. P.O. Box 212 2-1-1, N i s h i s h i n j u k u Shinjuku-ku, Tokyo 160, Japan Dr. S h i t o r i

j\fjtY*J

M>rA *r*> TS7 K M

h

07/31/86 ζ y^7 t l M 7 * * VA* TAW 7'W J 7***4 7*>*. X/3*HE *3*77*? NSC Λ * > 3 ' ^

3 ^ *3*77"y ta*t? * T * V 7U71/

J AJA J

τ ' >*Λ*?

* > * Λ t47*? **?1/Λ 3 τ 4 τ * ; Χ . fc*7*r

A

*l/72. y

9*'ΐΑΨ>τ

Λ At* y

Y tt>* 3 7 -

7VD*

a*^7>^.

V. L. Narayanan, Ph.D., Chief Drug Synthesis & Chemistry Branch National Cancer I n s t i t u t e Landow B u i l d i n g , Room 5C-18B Bethesda, Maryland, 20892 U.S.A.

Figure 6.

Letter to a Japanese Supplier.

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

112

GRAPHICS FOR CHEMICAL STRUCTURES

Figure 7. Entry of Structures into the DIS.

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

113

Data Base System That Relies on Graphics

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

ML INE

F i g u r e 8.

S t r u c t u r e Output from the

DIS.

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

114

GRAPHICS FOR CHEMICAL STRUCTURES

for human t r i a l s , the accumulated b i o l o g i c a l data that have been measured on i t t y p i c a l l y w i l l require hundreds or even thousands of pages for a l i s t i n g . It i s almost impossible to assimilate such quantities of data and many years ago, NCI s t a f f designed a "Screening Data Summary" which condenses the data greatly, reducing the page count of an alphanumeric report by about one order of magnitude. Such a summary may s t i l l be 100 pages i n length and difficult to digest and so recourse has been made to graphics representations of the data to condense i t s t i l l further. A bar-graph format was developed for the screening data and i s shown in Figure 9. Each tumor system tested has one or more bars associated with i t . The i d e n t i t y of the tumor system i s under the bar and the width of the bar i s proportional to the number of tests carried out i n that system. The height of the bar gives the highest observed a c t i v i t y , or %T/C. This i s measured with r e f e r ence to the l e f t v e r t i c a l axis (for s u r v i v a l , or life-span systems) and the right v e r t i c a l axis i n s o l i d tumor regression systems. The arrow under the bar serves to remind one as to which axis i s applicable. The center of the "X" i n the bar represents the mean of a l l the %T/C values obtained and the v e r t i c a l height of the X provides the standard deviation i n the data. Other legends i n this diagram include the device (square or t r i a n g l e ) at the foot of the bars. This indicates the drug administration route; the square means intraperitoneal, the t r i a n g l e , intravenous. The black horizontal bars show the l e v e l of a c t i v i t y that i s the p a s s / f a i l c r i t e r i o n currently i n use at NCI. The smaller bar graph at the bottom of the Figure provides dose l e v e l information. Each single dose i s drawn here, referenced to the ordinate labelled "DOSE AMT" and the abscissa calibrated i n days. Thus i n the f i r s t case, Q01DX09 i s standard medical terminology for "once per day on days 1 through 9", and there are therefore 9 v e r t i c a l bars, one for each day beginning on day 1; the height of each bar corresponds to a dose amount of 150 mg/kg of mouse body weight. A more adventurous v a r i a t i o n on the bar-graph format i s shown in Figure 10. Here, instead of a bar, we use a flower to carry the data. The height of a flower's stem indicates the median %T/C found with that drug against the tumor, whose i d e n t i t y can be found at the foot of the flower. The stem height i s referenced to the v e r t i c a l axis towards which the flower i s leaning. The v e r t i c a l measure of the bud, or center, of the flower shows the standard deviation i n the data while the number of petals possessed by the flower conveys the number of completed experiments. The horizontal width of the bud i s inversely related to the dose amount used and f i n a l l y , the triangle drawn on the stem of a flower, i f present, says that drug administration i n this case was intravenous. Both the bar graph and the flower garden provide for s i g n i f i c a n t compression of data. Just as the screening data summary was about 10% of the size of the raw data dump, so these graphics both are about 10% of the screening data summary, i n terms of page count. A more subtle property of the flower garden, which i s not possessed by the bar graph, i s that because readers a l l can d i s t i n guish healthy flowers from others, i t i s possible to make "successf u l " tests i d e n t i f y themselves. In Figure 10, the two most robust-

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

Figure 9.

Bar-Graph Format for B i o l o g i c a l Data.

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

116

GRAPHICS FOR CHEMICAL STRUCTURES

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

10.

MILNE

Data Base System That Relies on Graphics

117

looking flowers are the f i r s t and the seventh - i . e . those with many petals and at least modest height. These are i n fact the two most s i g n i f i c a n t tests; I t was this compound's a c t i v i t y against 3PS31 (leukemia) that l e d to i t s continued testing and the d i s covery of major a c t i v i t y i n 3C872 (colon). The compound i s now being a c t i v e l y tested against colon cancer i n humans. The point of interest i s that someone with l i t t l e s c i e n t i f i c background would probably have nominated these two flowers as representative of the "best" tests, and they would have been r i g h t . Not only then i s there much data i n t h i s diagram, there i s also an i m p l i c i t "data key" which makes assimilation of the information easier. Expert Systems. A quite different way to deal with the problem of voluminous output is to condense i t by means of an i n t e l l e c t u a l summary. The Screening Data Summary described above i s non-intellectual i n that i t merely discards some data and reformats the remainder. A role that can be played by computer programs, sometimes termed "expert systems", i s to analyze a l l the data for s i g n i f i c a n t content, using a set of rules, and then generate a report based upon that s i g n i f i c a n t content. In the NCI, one i s frequently asked for the status of a chemical which i s somewhere i n the multi-year testing cycle. In such a context, "status" implies a s k e l e t a l description of the data on the compound along with i t s position i n a temporal sense (what has been done? what remains to be done?). Also loaded into the idea i s some sort of a performance r a t i n g . Has i t f a i l e d t r i a l s ? Is i t expected to f a i l t r i a l s ? W i l l i t make i t as far as the c l i n i c ? And so on. A program developed within the DIS makes a start on responding to this sort of query. I t simply reviews a l l the data on a compound and then produces a one-page report l i k e the one shown i n Figure 11. The basic information on the compound, i t s identity and i t s supplier, are given. Then current inventories are reported and f i n a l l y , a very concise history of i t s b i o l o g i c a l testing i s developed and used to f i n i s h out the report. The document that i s produced i s very short - always less than one page - and as such i s popular with senior management because i t t e l l s what they need to know without descending into excessive d e t a i l . A document of this sort cannot be produced readily i f graphical p r i n t i n g and variable fonts are not readily accessible and therefore this should be regarded as another type of graphical output. Summary If language or numbers represent a r t i f i c i a l means of human communication, then graphic display i s a more fundamental mode. I t i s at the same time more powerful and easier to understand - an unusual combination of p o s i t i v e s . Computer manipulation of graphics has always been more d i f f i c u l t than the common a l phanumeric computation and this has been an impediment. As technology improves however, generation of graphics has become simpler

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

GRAPHICS FOR CHEMICAL STRUCTURES

118

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

NSC-600000 The compound with NSC number 600000 is l,3-Diazaspiro[4.5]decane-2,4-dione, 3-[4[bis(2-chloroethyl)amino]butyl]-. Its molecular formula is C16H27C12N302. The compound's formula weight is 364 and its structure is shown below:

The first sample of NSC-600000 was obtained by NCI on 22-Feb-85 from: Dr. John Driscoll Drug Design & Chemistry Section LMCB, DTP, DCT, NCI, NIH Bldg. 37, Room 6D24 Bethesda, MD 20892 On 7-Aug-86, the total inventory of NSC-600000 was 800 mg, in one sample. NSC-600000 is currently classified as ID (Deferred: Does not meet DN-2 activity criteria), on the basis of its activity in P388 Leukemia. The compound was tested in only one tumor system. Figure 11. An Executive Summary Produced from the DIS.

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.

10.

ML INE

119

Data Base System That Relies on Graphics

Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on April 15, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch010

and i t i s becoming possible for computer systems such as the DIS to take much greater advantage of graphics · Our ideas are s t i l l primitive - i t i s d i f f i c u l t to believe just how exploitable graphics are - but already i t begins to appear, as we had suspected - that, faced with data presented i n graphical form, humans can absorb i t at astonishing rates, and with astonishing discrimination. I f we can think how to express a megabyte of data graphically, the reader can absorb i t with l i t t l e thought. Examples abound (1) of graphics which, though they contain the equivalent of between 20,000 and a half m i l l i o n numbers on a single page, can be taken i n at a glance by the reader. The same volume of data, expressed i n alphanumeric form might spend months seeking a reader with adequate diligence, competence and stamina. The lessons are simple: the technical d i f f i c u l t i e s associated with graphics are rapidly evaporating; the end-user, any end-user, functions extraordinarily well with graphics presentations; most end-users lack the attention span to deal with large amounts of alphanumeric output. The challenge that remains i s to devise a clear, communicative graphical vehicle for your message. I f you can do t h i s , your computer and your reader can do the r e s t .

Literature Cited 1. Tufte, Ε. R. The Graphical Display of Quantitative Information; Graphics Press: Cheshire, CT, 1983; pp 16, 155. RECEIVED

April 10, 1987

Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.