The Beilstein Online Database - ACS Publications - American


The Beilstein Online Database - ACS Publications - American...

0 downloads 162 Views 1MB Size

Chapter 8

Physical Property Data Capabilities for Search and Retrieval Andreas Barth

Downloaded by CORNELL UNIV on August 6, 2016 | http://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch008

STN International, FIZ Karlsruhe, D-7514 Eggenstein-Leopoldshafen 2, Federal Republic of Germany

A comprehensive description of the d i f ferent capabilities to search and retrieve physical properties i n the Beilstein database i s presented i n t h i s paper. The technics for searching numeric fields i s discussed and the logic of numeric range r e t r i e v a l i s explained i n d e t a i l . A new concept of range matching, called numeric range overlap detection, i s introduced and i l l u s t r a t e d with several examples. A new class of search fields have been developed which allow for more sophisticated property searches, e.g. l i k e a r e s t r i c t i o n of the answer sets to c r i t i c a l l y evaluated (handbook) data. New Messenger c a p a b i l i t i e s supporting the special requirements of numeric databases are shortly d e s c r i bed. Among them are the unit conversion feature, the capability to search for missing values and the concept of t o l e rance specification.

The B e i l s t e i n d a t a b a s e i s t h e l a r g e s t s o u r c e o f p h y s i c a l p r o p e r t y d a t a w i t h r e s p e c t t o b o t h t h e number o f s u b s t a n c e s a n d t h e number o f d i f f e r e n t p h y s i c a l p r o p e r t i e s . T h e r e a r e a b o u t 70 p r o p e r t i e s i n d e x e d i n n u m e r i c f i e l d s

0097-615^/9(V()436-0113$06w00A) © 1990 American Chemical Society

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

Downloaded by CORNELL UNIV on August 6, 2016 | http://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch008

114

THE BEILSTEIN ONLINE DATABASE

and a b o u t 240 d i f f e r e n t k e y w o r d s c o r r e s p o n d i n g t o p h y s i ­ c a l e n t i t i e s s t o r e d i n t e x t u a l f i e l d s as c o n t r o l l e d v o c a b u l a r y . F o r a l l p h y s i c a l p r o p e r t i e s and c o n t r o l l e d terms there is at least one literature reference p o i n t i n g to the source of i n f o r m a t i o n . In B e i l s t e i n the physical properties are conceptually ordered in a h i e r a r c h i c a l manner, i . e . l i k e a t h e s a u r u s f i l e . This ordering i s a l s o r e f l e c t e d i n the order of the d i s p l a y f o r m a t s a s i s d e p i c t e d i n F i g u r e 1. P h y s i c a l p r o p e r t i e s a r e measured o r c a l c u l a t e d q u a n t i t i e s r e c o r d e d as numeric v a l u e s a s s o c i a t e d w i t h an u n c e r t a i n t y a n d a p h y s i c a l u n i t . The n u m e r i c v a l u e s i n c l u d i n g t h e u n c e r t a i n t y are s t o r e d as numeric ranges i n t h e d a t a b a s e . There i s an i m p l i c i t p h y s i c a l unit c o r r e s p o n d i n g t o each s e a r c h and d i s p l a y f i e l d w h i c h can be c h a n g e d b y t h e c u s t o m e r . I n g e n e r a l , p h y s i c a l p r o p e r ­ t i e s a l s o d e p e n d on a s e t o f p a r a m e t e r s l i k e t e m p e r a t u r e o r p r e s s u r e . With the Messenger software i t i s p o s s i b l e t o p e r f o r m parameter dependant searches o f p r o p e r t i e s using a proximity operator. The d e s i g n o f p h y s i c a l p r o p e r t i e s m i m i c s the d a t a s t r u c t u r e ( f o r an e x a m p l e s e e t a b l e I ) . T h e r e i s a n e n t i t y name, e . g . E n t h a l p y o f F o r m a t i o n , a n d a c o r r e s ­ p o n d i n g f i e l d q u a l i f i e r HFOR. The f i e l d qualifiers for

Table I .

Design o f P h y s i c a l Data

F i e l d Name

Field

Enthalpy of Formation Temperature Pressure

HFOR HFOR.Τ HFOR.Ρ

Qualifier

Unit J/mol Cel Torr

t h e parameters are b u i l t by u s i n g t h e a b b r e v i a t i o n f o r t h e e n t i t y , a d o t (*.*) and an a b b r e v i a t i o n f o r t h e parameter. Thus, the f i e l d q u a l i f i e r f o r the Temperature (T) o f t h e Heat o f F o r m a t i o n i s HFOR.T. The m a i n qualifier, e . g . HFOR, i s a l s o t h e d i s p l a y f o r m a t f o r t h i s e n t i t y . To e a c h n u m e r i c f i e l d a u n i q u e p h y s i c a l u n i t i s associated, i . e . a l l values i n a f i e l d are given in the same unit. In the following sections the r e t r i e v a l of physical property information i s described in detail.

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

8. BARTH

Downloaded by CORNELL UNIV on August 6, 2016 | http://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch008

Format

115

Physical Property Data

Content

PHY Physical Properties CTUNCH C o n t r o l l e d Terms o f U n c h e c k e d D a t a ECB Electrochemical Behaviour ΕLE E l e c t r i c a l Data MAG Magnetic Data MCS M u l t i - C o m p o n e n t S y s t e m D a t a (MCS) ADSM A d s o r p t i o n D a t a o f MCS ASSM A s s o c i a t i o n D a t a o f MCS BSPM B o u n d a r y S u r f a c e Phenomena o f MCS ENEM E n e r g e t i c D a t a o f MCS GASM Gas P h a s e S y s t e m D a t a o f MCS LLSM L i q u i d / L i q u i d S y s t e m D a t a o f MCS LSSM L i q u i d / S o l i d S y s t e m D a t a o f MCS LVSM L i q u i d / V a p o u r S y s t e m D a t a o f MCS SOLM S o l u t i o n B e h a v i o u r o f MCS TRAM T r a n s p o r t Phenomena o f MCS MEC Mechanical Properties OPT O p t i c a l Data SAG State of Aggregation CRY C r y s t a l Phase GAS Gas P h a s e LIQ Liquid State SEP S t r u c t u r e and E n e r g y P a r a m e t e r s CFM Conformation CPL C o u p l i n g Phenomena ELM E l e c t r i c a l Moment ΕLP Electrical Polarizability MEN M o l e c u l a r Energy SKC Skeletal Characteristics SPE S p e c t r a l Data CTNQR EMS E m i s s i o n Spectrum ESP E l e c t r o n i c Spectrum ESR E l e c t r o n S p i n Resonance Spectrum NMR N u c l e a r M a g n e t i c Resonance OSM O t h e r S p e c t r o s c o p i c Methods ROT R o t a t i o n a l Spectrum VIB V i b r a t i o n a l Spectrum THE Thermodynamic D a t a CAL C a l o r i f i c Data HCP Heat C a p a c i t y THF Thermodynamic F u n c t i o n s TRA T r a n s p o r t Phenomena CND Thermal C o n d u c t i v i t y DIF Diffusion VIS Viscosity F i g u r e 1. H i e r a r c h i c a l s t r u c t u r e o f p r o p e r t i e s Beilstein

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

in

116

TOE BEILSTEIN ONLINE DATABASE

Downloaded by CORNELL UNIV on August 6, 2016 | http://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch008

Retrieval of Physical Property Information Definition of Notations. In the B e i l s t e i n database, most o f t h e p h y s i c a l p r o p e r t i e s a r e g i v e n a s n u m e r i c d a t a . Due t o h i s t o r i c a l r e a s o n s , some p r o p e r t i e s are r e c o r d e d a s e x a c t v a l u e s w i t h o u t an u n c e r t a i n t y . T h e s e p r o p e r t i e s a r e c a l l e d s i n g l e p o i n t e n t i t i e s and t h e y c a n be t r e a t e d a s s i m p l e n u m e r i c f i e l d s i n b i b l i o g r a p h i c databases, i . e . only s i n g l e values are s t o r e d . P h y s i c a l properties which are recorded with an associated u n c e r t a i n t y a r e r e f e r r e d t o as n u m e r i c r a n g e e n t i t i e s . A n u m e r i c r a n g e c o n s i s t s o f a l o w e r and a n u p p e r v a l u e w h i c h i s e q u i v a l e n t t o a v a l u e p l u s / m i n u s an u n c e r t a i n t y . For these e n t i t i e s , the endpoints o f the ranges are indexed i n a numeric f i e l d . I n some c a s e s , t h e k n o w l e d g e a b o u t a p r o p e r t y o f a s u b s t a n c e i s v e r y f u z z y and i t i s o n l y known w h e t h e r i t i s b e l o w o r above a c e r t a i n l i m i t . A s an e x a m p l e , some m e l t i n g p o i n t s a r e r e c o r d e d t o be g r e a t e r t h a n o r e q u a l t o a s p e c i f i c v a l u e . In other words, the corresponding n u m e r i c r a n g e i s an open r a n g e , s i n c e one e n d p o i n t i s o p e n ( i n f i n i t e ) . Ranges where b o t h e n d p o i n t s a r e f i n i t e a r e r e f e r r e d t o as c l o s e d r a n g e s . I n g e n e r a l , i t i s o f minor importance f o r a user whether the s t o r e d ranges a r e open o r c l o s e d . H o w e v e r , t h e r e a r e some c a s e s w h e r e i t i s n e c e s s a r y t o i n c l u d e / e x c l u d e one t y p e o r t h e o t h e r and, therefore, i t i s important to understand the d i f ference. The c o n c e p t o f ' r a n g e s e a r c h i n g i s a s s o c i a t e d w i t h s e v e r a l d i f f e r e n t meanings. F i r s t l y , t h e word 'range c o u l d r e f e r t o b o t h t h e q u e r y and t h e i n d e x e d r a n g e . I n many d a t a b a s e s i t i s p o s s i b l e t o s p l i t t h e f i l e into segments by r a n g e s o f f i e l d v a l u e s . T h i s f e a t u r e is called file s e g m e n t a t i o n . A commonly u s e d f i e l d for b u i l d i n g these ranges i n b i b l i o g r a p h i c databases i s the P u b l i c a t i o n Year (PY). In t h i s case, the user can s p e c i f y a r a n g e o f p u b l i c a t i o n y e a r s u s i n g t h e SET command and a l l s u b s e q u e n t s e a r c h e s a r e l i m i t e d t o t h i s range o f y e a r s . T h i s c a p a b i l i t y i s o f t e n used t o r e s t r i c t t h e number o f h i t s f o r q u e r i e s w i t h a huge number of i n t e r m e d i a t e answers. Without t h i s segmentation, the s e a r c h query would exceed t h e system l i m i t s and f i n a l l y abort. In s t r u c t u r e - o r i e n t e d databases l i k e the Beils t e i n or R e g i s t r y f i l e i t i s a l s o p o s s i b l e to perform r a n g e s e a r c h e s u s i n g a r a n g e o f R e g i s t r y Numbers. T h e s e two p o s s i b i l i t i e s a r e a c t u a l l y e q u i v a l e n t s i n c e t h e y a r e b o t h b a s e d on t h e p r i m a r y f i l e k e y , i . e . t h e R e g i s t r y Number. In the case of the segmentation u s i n g the p u b l i c a t i o n y e a r , the range o f y e a r s i s c o n v e r t e d i n t o a 1

1

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

8. ΒΑΚΉΙ

117

Physical Property Data

r a n g e o f r e g i s t r y n u m b e r s . The f i l e s e g m e n t a t i o n f e a t u r e w i l l n o t be d i s c u s s e d f u r t h e r i n t h i s p a p e r . Another concept o f range s e a r c h i n g i s b a s e d on the f o r m u l a t i o n o f a query as a numeric r a n g e . In n u m e r i c f i e l d s a s e a r c h c a n be p e r f o r m e d e i t h e r a s a s e a r c h f o r a s i n g l e v a l u e o r as a s e a r c h f o r a range o f v a l u e s . A s i m p l e e x a m p l e c a n be f o r m u l a t e d f o r t h e f i e l d P u b l i c a t i o n Y e a r ( P Y ) . F o r e x a m p l e , one c o u l d s e a r c h f o r a l l documents p u b l i s h e d i n 1988 (PY = 1988) o r f o r t h o s e w h i c h h a v e b e e n p u b l i s h e d b e t w e e n 1986 and 1989 (1986 = 108) i s a l s o p a r t o f t h e answer s e t . The l a s t e x a m p l e ( F i g u r e 4) i s a s e a r c h o f a n o p e n r a n g e , i . e . a B o i l i n g P o i n t BP >= 110 C . A g a i n , t h i s i n t e r v a l i s r e p r e s e n t e d by a shaded r e c t a n g l e b u t t h e r i g h t l i m i t i s a t i n f i n i t y (open). F o r t h i s example t h e answer s e t c o n t a i n s t h e ranges 90-110, 9 5 - 1 1 5 , 108-112, and >=108. I n t h e p r e v i o u s s e a r c h e s t h e a n s w e r s e t c a n become r a t h e r d i f f u s e due t h e b r o a d n e s s o f t h e s t o r e d r a n g e s and i t may b e n e c e s s a r y t h a t t h e a n s w e r s e t h a s t o b e reduced by c o m b i n i n g i t w i t h o t h e r f a c t u a l i n f o r m a t i o n . However, i t i s i m p o r t a n t t o note t h a t t h e answer s e t i s c o m p l e t e i n t h e s e n s e t h a t no p o s s i b l e h i t w i l l be m i s s e d . Even an i m p r e c i s e s p e c i f i c a t i o n l i k e Melting P o i n t MP >= 50 C will be found by e v e r y query o v e r l a p p i n g w i t h t h e i n d e x e d r a n g e f r o m 50 t o i n f i n i t y , e . g . a s e a r c h f o r a M e l t i n g P o i n t MP = 1000 *C w i l l a l s o retrieve this 'range .

Downloaded by CORNELL UNIV on August 6, 2016 | http://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch008

e

e

1

More s o p h i s t i c a t e d p r o p e r t y s e a r c h e s . The standard numeric range searching with the overlap detection feature y i e l d s , i n general, rather s a t i s f y i n g r e s u l t s . However, it is sometimes n e c e s s a r y to o b t a i n more s p e c i f i c a n s w e r s e t s . F o r t h i s p u r p o s e , a new s u b f i e l d has been introduced for all physical properties c o n t a i n i n g a number o f k e y w o r d s d e s c r i b i n g t h e t y p e o f Table I I I . Keyword

1

Properties

Meaning

HANDBOOK UNCHECKED EXACT RANGE CLOSED OPEN EXPERIMENTAL

L i s t o f Keywords f o r P h y s i c a l

1

c r i t i c a l l y e v a l u a t e d d a t a from t h e handbook n o n - e v a l u a t e d d a t a from t h e l i t e r a t u r e excerpts single point value numeric range v a l u e numeric range w i t h f i n i t e boundaries numeric range w i t h an i n f i n i t e boundary experimentally determined value

C u r r e n t l y a l l p h y s i c a l property values i n the s t e i n database are experimentally determined.

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

Beil-

8. BARTH

121

Physical Property Data

d a t a . T h i s s u b f i e l d i s addressed by u s i n g t h e main f i e l d q u a l i f i e r f o r t h e e n t i t y and c o n c a t e n a t i n g i t w i t h ' . K W ( K e y w o r d ) , e . g . f o r t h e B o i l i n g P o i n t (BP) t h e a b b r e v i a tion for this f i e l d i s BP.KW. I t i s only searchable a n d may c o n t a i n a n y s u b s e t o f t h e k e y w o r d s l i s t e d i n table III. I t i s obvious t h a t t h e s e keywords a r e grouped i n p a i r s , e . g . a n u m e r i c p r o p e r t y may o r i g i n a t e f r o m t h e handbook o r from t h e l i t e r a t u r e e x c e r p t s , i t c o u l d b e e i t h e r an e x a c t v a l u e o r a range and t h e range c o u l d be e i t h e r c l o s e d o r o p e n . The k e y w o r d EXPERIMENTAL' h a s been introduced for future use in case that the Beilstein Institute decides to include theoretically computed p r o p e r t i e s in their database. Using these k e y w o r d s , i t i s p o s s i b l e t o o b t a i n more s p e c i f i c r e s u l t s t h a n i n t h e c a s e o f a p u r e n u m e r i c r a n g e s e a r c h . The keywords have to be interpretated formally as a p a r a m e t e r o f t h e p h y s i c a l p r o p e r t y , h e n c e , t h e y must b e combined w i t h t h e ( P ) - p r o x i m i t y . I n F i g u r e 5 an e x a m p l e i s p r e s e n t e d t o r e s t r i c t t h e n u m e r i c s e a r c h t o d a t a from t h e B e i l s t e i n h a n d b o o k u s i n g t h e k e y w o r d 'HANDBOOK . The r e s u l t f r o m t h i s s e a r c h i s o b t a i n e d from c r i t i c a l l y e v a l u a t e d d a t a o n l y . H o w e v e r , t h e r e may b e o t h e r measurements f o r t h e s u b s t a n c e s o f t h e a n s w e r s e t w h i c h a r e i n t h e same r a n g e b u t o r i g i n a t e from n o n - e v a l u a t e d d a t a . I t may a l s o be t h a t t h e r e a r e completely d i f f e r e n t values f o r these substances which do n o t i n t e r s e c t a t a l l w i t h t h e n u m e r i c r a n g e o f t h e q u e r y . The a n s w e r s e t i s b u i l t b y m a p p i n g t h e q u e r y a n d t h e s t o r e d n u m e r i c v a l u e s and t h e k e y w o r d s a n d the r e s u l t i s a set of B e i l s t e i n substances s a t i s f y i n g the query. An i n t e r e s t i n g p o i n t i s t h e r e s t r i c t i o n o f s e a r c h e s t o exact v a l u e s . As d e s c r i b e d i n the p r e v i o u s s u b s e c t i o n a numeric s e a r c h i s performed by i n v o k i n g t h e o v e r l a p d e t e c t i o n p r o c e d u r e and t h e r e s u l t c o m p r i s e s a l l t h e ranges i n t e r s e c t i n g the search query. U s i n g the keyword E X A C T i t i s now p o s s i b l e t o r e s t r i c t t h e s e a r c h t o a s i n g l e e x a c t v a l u e . I n f i g u r e 6 an example f o r such a search is presented. If this keyword i s used in c o n j u n c t i o n w i t h a query range, the r e s u l t i s a numeric range where b o t h e n d p o i n t s a r e e x a c t l y i d e n t i c a l t o t h e endpoints of the query. The last example i n F i g u r e 7 r e p r e s e n t s a p o s s i b i l i t y t o e x c l u d e open r a n g e s from t h e numeric r a n g e o v e r l a p d e t e c t i o n . T h i s means t h a t t h e diffuse r a n g e s a r e n o t t a k e n i n t o a c c o u n t when t h e a n s w e r s e t i s b u i l t . I n t h i s c a s e one h a s t o s e a r c h f o r e x a c t v a l u e s ( EXACT ) o r c l o s e d ranges ( CLOSED ). Analogously, it i s a l s o p o s s i b l e to r e s t r i c t the search to r e t r i e v e only 1

Downloaded by CORNELL UNIV on August 6, 2016 | http://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch008

1

1

1

1

1

1

1

1

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

122

THE BEILSTEIN ONLINE DATABASE

90

95

100 105

110

115

90- 110 90-

95

95- 115

Downloaded by CORNELL UNIV on August 6, 2016 | http://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch008

98- 100 100 105 108- 112 >=108 Figure 4. Schematic r e p r e s e n t a t i o n range search

=> s 3 - 5/orp 2256 33253 L5 892

o f a n open e n d e d

(p) handbook/orp.kw 3 - 5/0RP HANDBOOK/0RP.KW 3 - 5/ORP (P) HANDBOOK/ORP.KW

-> d h i t L5

ANSWER 1 OF 892

O p t i c a l R o t a t o r y Power: 0RP 3.550 deg Type : Solv: a q . NaOH Wavel: 589.00 nm Temp: 20.0 C e l Reference (s)* 1. E . F i s c h e r , ' Chem.Ber. 40 ,1758, CODEN: CHBEAM Note(s) :. 2. Handbook Data 3. 0.325 g i n 4.0908 g Loesung. 4. < d - . a l p h a . - b r o m o - i s o c a p r o y l > - h e x a g l y c y l g l y c i n e F i g u r e 5. S e a r c h example t o r e s t r i c t e v a l u a t e d (Handbook) d a t a

the answers

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

to

8. BARTH

123

Physical Property Data

=> s 1650 - 1700/irm (p) exact/irm.kw 4786 1650 - 1700/IRM 190 EXACT/IRM.KW Ll 29 1650 - 1700/IRM (P) EXACT/IRM.KW => s 1650 - 1700/irs (p) e x a c t / i r s . k w 9040 1650 - 1700/IRS 80 EXACT/IRS.KW L2 8 1650 - 1700/IRS (P) EXACT/IRS.KW

Downloaded by CORNELL UNIV on August 6, 2016 | http://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch008

=> s 11 o r 12 L3

37 L l OR L2

=> d h i t L3 ANSWER 1 o f 37 I n f r a r e d Maximum: IRM 1667 cm**-l Reference (s) * 1. A e l i o n , Champetier, Bull.Soc.Chim.Fr. Note (s) : 2. Handbook Data 3. A b s o r p t i o n .

F i g u r e 6. Search o f exact

1949 529, CODEN: BSCFA

values

=> s 20 - 30/mp (p) (exact o r closed)/mp.kw 4306 20 - 30/MP 396045 EXACT/MP.KW 819485 CLOSED/MP.KW L4 4211 20 - 30/MP (P) (EXACT OR CLOSED)/MP.KW => d en h i t L4 CN

ANSWER 1 OF 4211 di-undec-10-enoyl p e r o x i d e Di-undec-10-enoyl-peroxid

Melting Point: Value (MP) (Cel)

Solv.(MP.SOL)

23.00 - 24.00 23.00 - 24.00

benzene, e t h a n o l 2 benzene, petroleum e t h e r 2

Ref. Note

Reference ( s ) : 2. Cooper, J.Chem.Soc. 1951 3106,3112, CODEN: JCSOA9 Npte(s): l'. Handbook Data

F i g u r e 7 . S e a r c h t o e x c l u d e open r a n g e s

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

1 1

124

THE BEILSTEIN ONLINE DATABASE

Downloaded by CORNELL UNIV on August 6, 2016 | http://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch008

open r a n g e s . I t s h o u l d be n o t e d , h o w e v e r , t h a t o p e n r a n g e s o c c u r o n l y f o r a v e r y l i m i t e d number o f f i e l d s . F i n a l l y , i t s h o u l d be n o t e d t h a t t h e r e i s an additional subfield for physical properties containing t h e i n t e r v a l l e n g t h . T h i s i s t h e d i f f e r e n c e between t h e upper and l o w e r l i m i t o f t h e numeric range w h i c h i s e q u a l t o t h e u n c e r t a i n t y o f t h e measurement m u l t i p l i e d by t w o . The new f i e l d i s b u i l t from t h e m a i n q u a l i f i e r c o n c a t e n a t e d w i t h ' . R A N ' , e . g . BP.RAN f o r t h e c o r r e s p o n d i n g s u b f i e l d o f t h e B o i l i n g P o i n t . A n EXPAND o f t h i s f i e l d shows t h e p r e c i s i o n o f t h e measurements f o r t h i s property. Specification of Tolerances. A search query f o r a n u m e r i c r a n g e c a n be s p e c i f i e d e i t h e r a s a r a n g e o r a s a v a l u e p l u s / m i n u s a t o l e r a n c e . B o t h forms a r e t r e a t e d by the Messenger software as e q u i v a l e n t , i . e . t h e f o l l o w i n g q u e r i e s w i l l r e s u l t i n t h e same a n s w e r s e t : => s e a r c h

4 - 1 0

/ r i

=> s e a r c h

r i = 7 +-

3.

In the latter case, the search query is first t r a n s f o r m e d i n t o a range and t h e n t h e n u m e r i c range i s s e a r c h e d . As d e s c r i b e d above, t h e answer s e t c o m p r i s e s a l l i n d e x e d r a n g e s o v e r l a p p i n g t h e r a n g e . The t o l e r a n c e may b e e x p r e s s e d i n one o f t h e f o l l o w i n g w a y s : -

as an a b s o l u t e v a l u e , e . g .

-

as a percentage,

e.g.

1 3 2 . 0 9 +- 0 . 0 2

or

45 +- 2%.

I t s h o u l d be n o t e d t h a t t h e p h y s i c a l u n i t s f o r t h e and t h e t o l e r a n c e must b e i d e n t i c a l , a m i x i n g o f c a n n o t be r e c o g n i z e d b y t h e s o f t w a r e . S p e c i a l Features Supporting the R e t r i e v a l o f Properties

value units

Physical

P h Y g J M l U n i t s and U n i t C o n v e r s i o n . Most physical q u a n t i t i e s are a s s o c i a t e d w i t h a u n i t s e r v i n g as t h e s t a n d a r d measure for this entity. E v e n t h o u g h much s t a n d a r d i z a t i o n has been done, t h e r e a r e s t i l l s e v e r a l d i f f e r e n t u n i t systems i n u s e . Of c o u r s e , t h i s i s a l s o d e p e n d i n g upon t h e a r e a o f a p p l i c a t i o n . I n t h e B e i l s t e i n d a t a b a s e most p r o p e r t i e s a r e measured i n S I u n i t s b u t

1 The p o s s i b i l i t y f o r t o l e r a n c e s p e c i f i c a t i o n w i l l i n t r o d u c e d i n the Messenger software i n e a r l y 1990.

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

be

8. BARTH

Physical Property Data

125

t h e r e a r e s t i l l some q u a n t i t i e s w h i c h a r e g i v e n i n o t h e r u n i t s . I n t a b l e IV the main u n i t s used i n the B e i l s t e i n f i l e are l i s t e d .

Downloaded by CORNELL UNIV on August 6, 2016 | http://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch008

Table I V . L i s t o f Main U n i t s used i n

Beilstein

Quantity

Unit

Unit

Amount o f S u b s t a n c e Angle Concentration

mole degree percentage gram/liter gram/cubic centimeter Volt Joule/mole Joule/mole*Kelvin Joule/mole*Kelvin milli-, centimeter,·.· gram Debye Torr degree C e l s i u s second nanometer

mol deg % g/L g/cm**3 V J/mol J/mol*K J/mol*K mm, cm, . . . g D Torr Cel s nm

Density Electric Potential Energy Entropy Heat C a p a c i t y Length Mass Moment ( D i p o l e ) Pressure Temperature Time Wavelength

Symbol

To o v e r c o m e t h e d i f f i c u l t y t o remember a l l t h e u n i t s u s e d i n d i f f e r e n t n u m e r i c d a t a b a s e s STN h a s d e v e l o p e d a feature for the conversion of u n i t s . This enables the customer t o work i n h i s p r e f e r r e d s e t o f u n i t s i n d e p e n d e n t o f t h e u n i t s u s e d i n t h e f i l e . The M e s s e n g e r s o f t w a r e w i l l a u t o m a t i c a l l y do t h e u n i t c o n v e r s i o n a n d s e a r c h o r d i s p l a y t h e d a t a i n t h e a p p r o p r i a t e u n i t s . The u n i t conversion c a p a b i l i t y enables the user: - t o s p e c i f y u n i t s w i t h numeric search terms - t o d i s p l a y the default u n i t for a property - t o s e t the system u n i t f o r a numeric p r o p e r t y according t o h i s / h e r convenience - t o s e l e c t a common s t a n d a r d f o r t h e s y s t e m u n i t s . I n p a r t i c u l a r , t h e c u s t o m e r may w o r k i n t h e default u n i t s , he may g l o b a l l y s e t u n i t s o r he c o u l d o v e r w r i t e a u n i t f o r a s p e c i f i c f i e l d . To i l l u s t r a t e t h i s f e a t u r e , a few e x a m p l e s a r e p r e s e n t e d i n F i g u r e 8 . I n t h e first example, a search using the default units for the database i s presented. I f the customer formulates h i s q u e r y w i t h o u t s p e c i f y i n g any u n i t s , t h e n t h e default u n i t s a r e t a k e n a n d no u n i t c o n v e r s i o n i s p e r f o r m e d .

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

Downloaded by CORNELL UNIV on August 6, 2016 | http://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch008

126

THE BEILSTEIN ONLINE DATABASE

C o n s e q u e n t l y , t h e d i s p l a y o f t h i s p r o p e r t y shows t h e p r o p e r t y i n the o r i g i n a l u n i t . I n the second example, the e x p l i c i t s p e c i f i c a t i o n of the u n i t i n the search s t a t e m e n t o v e r r i d e s t h e d e f a u l t u n i t o f t h i s f i e l d . The n u m e r i c range i s c o n v e r t e d from t h e d e f a u l t u n i t i n t o the u n i t o f t h i s search f i e l d . A subsequent d i s p l a y o f t h i s p r o p e r t y shows t h e o r i g i n a l u n i t s a g a i n . I n t h e t h i r d example, the u n i t f o r Heat C a p a c i t y i s g l o b a l l y s e t b y t h e M e s s e n g e r SET command. I n t h i s c a s e , t h e u n i t is changed f o r t h e r e s t of the session unless the customer changes i t a g a i n o r o v e r r i d e s i t explicitly. A l l s u b s e q u e n t s e a r c h e s a n d d i s p l a y s a r e now p r e s e n t e d i n t h e new u n i t . A s f o r t h e SEARCH command, t h e u s e r could also override the unit for a display of a p r o p e r t y . T h i s i s done i n t h e DISPLAY command shown i n t h e l a s t example i n F i g u r e 8. Retrieval of Missing Values. I n l a r g e numeric databases like Beilstein or Gmelin (Handbook of Inorganic C h e m i s t r y ) , not a l l p r o p e r t i e s are a v a i l a b l e f o r each s u b s t a n c e ( s e e S e c t i o n 3 , ' T h e STN I m p l e m e n t a t i o n o f t h e B e i l s t e i n F a c t u a l and S t r u c t u r e D a t a b a s e ) . In other words, there are ' h o l e s i n t h e f i l e . T h i s means t h a t f o r a g i v e n s u b s t a n c e , some p r o p e r t i e s a r e n o t p r e s e n t , e i t h e r because t h e y have not y e t been measured, or b e c a u s e t h e y w e r e n o t known a t t h e l i t e r a t u r e closing date f o r r e c o r d i n g the r e s p e c t i v e substance. When a customer is searching for a property with specific v a l u e s , i t may b e t h a t h e i s m i s s i n g p o t e n t i a l hits b e c a u s e some p r o p e r t y v a l u e s a r e n o t i n t h e d a t a b a s e , although they could p o s s i b l y overlap with the search range. I n a normal numeric search these p o t e n t i a l h i t s a r e n o t i n c l u d e d i n t h e answer s e t . H o w e v e r , i n STN databases there i s a p o s s i b i l i t y t o search f o r these missing values or ' h o l e s ' . I n F i g u r e 9 t h e r e i s an example f o r a s e a r c h o f t h e ' h o l e s ' i n t h e f i e l d M e l t i n g P o i n t (MP) . T h i s i s done b y O R i n g t h e n u m e r i c s e a r c h w i t h the term 'MP/FNA' ( F i e l d Hot a v a i l a b l e ) . I t should be n o t e d , however, t h a t a s i m p l e s e a r c h f o r m i s s i n g v a l u e s s h o u l d n o t b e p e r f o r m e d b e c a u s e , i n some c a s e s , t h e number o f h o l e s i s g r e a t e r t h a n t h e number o f s u b s t a n c e s w i t h v a l u e s f o r t h i s p r o p e r t y and t h e s y s t e m limits are easily exceeded. Thus, it is strongly recommended t o u s e t h e ' F N A - S e a r c h ' o n l y i f i t i s r e a l l y necessary and combine t h e search always w i t h some r e s t r i c t i o n s t o keep t h e answer s e t s w i t h i n t h e system limits. 1

1

S e < * r c h i y i g — f o r — P r o p e r t y Names. m addition to the numeric search c a p a b i l i t i e s f o r the p r o p e r t i e s , t h e r e i s

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

8. BARTH

Physical Property Data

Example: D e f a u l t

127

Units

=> s 120 - 140/cp LI

3 120 J/MOL*K -

140 J/MOL*K /CP

Example: User D e f i n e d U n i t s => s 0.05 - 0.08 kcal/mol*k /cp L2 2 0.05 - 0.08 KCAL/MOL*K /CP Example: G l o b a l S e t t i n g o f U n i t s

Downloaded by CORNELL UNIV on August 6, 2016 | http://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch008

=> s e t u n i t cp=kj/mol*k SET COMMAND COMPLETED => d i s u n i t cp CP CP => s 0.05 - 0.1 L3

DEFAULT : CURRENT:

J/MOL*K KJ/MOL*K

/cp

2 0.05 KJ/MOL*K -

0.1 KJ/MOL*K /CP

=> d cp L3 ANSWER 1 OF 2 Heat C a p a c i t y Cp: Value (CP) (KJ/MOL*K)

Temp. (CP,.T) (Cel)

Ref.

Note

0.12996 0.16513 0.06131 - 0.11887 0.00314 - 0.12658

26.7 93.0 -179. 5 - -6..6 -258. 1 - 26..9

2 2 3 5

1 1 1 If 4

Reference(s): 2. S c h l i n g e r , S a g e , Ind.Eng.Chem. 442454,2456, CODEN: IECHAD 3. Todd,Parks, J.Amer.Chem.Soc. 58134, CODEN: JACSAT 5. Scott,Ferguson,Brickwedde, J.Res.Natl.Bur.Stand.(U.S.) 334, CODEN: JRNBAG Note(s): 1. Handbook Data 4. cp :beim S a e t t i g u n g s d r u c k .

F i g u r e 8. E x a m p l e s f o r u s i n g t h e u n i t c o n v e r s i o n capability

=> s 10 - 30/mp o r mp/fna 137 10 CEL - 30 CEL /MP 38697 ALL/FA 26568 MP/FA 12129 MP/FNA (ALL/FA NOT MP/FA) L4 12266 10 CEL - 30 CEL /MP OR MP/FNA

F i g u r e 9. Example f o r t h e r e t r i e v a l o f documents missing values

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

with

Downloaded by CORNELL UNIV on August 6, 2016 | http://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch008

128

THE BEILSTEIN ONLINE DATABASE

a l s o the p o s s i b i l i t y to search f o r the a v a i l a b i l i t y o f the property itself. For a l l properties which are a v a i l a b l e f o r a g i v e n s u b s t a n c e , t h e p r o p e r t y name i s i n d e x e d i n t h e f i e l d FA ( f i e l d A v a i l a b i l i t y ) a n d i n PH ( E r o p e r t y H i e r a r c h y ) . Those p r o p e r t i e s h a v i n g o n l y a literature reference but no numeric v a l u e in the d a t a b a s e , h a v e a n i n d e x e n t r y b o t h i n CT/CTM ( C o n t r o l l e d X e r m s / M u l t i - C o m p o n e n t System) and i n P H . Hence, the t h r e e f i e l d s c a n be u s e d f o r d i f f e r e n t p u r p o s e s of a v a i l a b i l i t y searches: - t o r e t r i e v e a numeric v a l u e and/or a l i t e r a t u r e r e f e r e n c e ( P r o p e r t y H i e r a r c h y : PH) - t o r e t r i e v e a numeric v a l u e ( F i e l d A v a i l a b i l i t y : F A ) - to r e t r i e v e a l i t e r a t u r e reference only (Controlled Terms : C T / C T M ) . Expanding the F i e l d A v a i l a b i l i t y provides the l i s t o f p r o p e r t y names and t h e c o r r e s p o n d i n g number o f o c c u r r e n c e s , i . e . t h e number o f s u b s t a n c e s f o r w h i c h t h e p r o p e r t y i s a v a i l a b l e . The s t r a t e g y t o s e a r c h f o r p r o p e r t y names i s v e r y s i m p l e a n d i t c o r r e s p o n d s t o s e a r c h i n g c o n t r o l l e d v o c a b u l a r y i n b i b l i o g r a p h i c d a t a b a s e s . An e x a m p l e i s g i v e n i n F i g u r e 1 0 . H e r e , we h a v e s e a r c h e d for the a v a i l a b i l i t y o f numeric data f o r Molar P o l a rization. I f one i s i n t e r e s t e d i n e i t h e r e v a l u a t e d o r n o n evaluated property information, there i s another p o s s i b i l i t y t o search f o r the a v a i l a b i l i t y o f numeric data using the keyword s u b f i e l d s ('xx.KW'). As d e s c r i b e d p r e v i o u s l y , t h e k e y w o r d s 'HANDBOOK and 'UNCHECKED' a r e indexed f o r each numeric e n t r y i n the corresponding keyword f i e l d . A s e a r c h f o r t h e s e keywords t o g e t h e r w i t h t h e numeric v a l u e s r e s t r i c t s t h e answer s e t t o e i t h e r evaluated o r non-evaluated data s a t i s f y i n g the numeric q u e r y . I f t h e query c o n t a i n s o n l y t h e keyword term and not the numeric term, then t h i s i s e q u i v a l e n t t o a search for the a v a i l a b i l i t y of t h i s property r e s t r i c t e d t o e i t h e r handbook o r unchecked d a t a . In the second example o f F i g u r e 10, an example f o r t h i s t y p e o f s e a r c h is listed. It s h o u l d be n o t e d that this kind of a v a i l a b i l i t y s e a r c h c a n n o t be done w i t h t h e f i e l d F A . 1

The B e i l s t e i n d a t a b a s e c o n t a i n s a c o m p r e h e n s i v e m a n i f o l d of d i f f e r e n t p h y s i c a l property data. T h e r e a r e many numeric databases p u b l i c l y a v a i l a b l e through online s e r v i c e s b u t none o f them c a n compete w i t h B e i l s t e i n with respect t o b o t h t h e number o f s u b s t a n c e s and p r o p e r t i e s . The i m p l e m e n t a t i o n o f t h e B e i l s t e i n d a t a b a s e on STN h a s p r o v i d e d a number o f new f e a t u r e s and

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

8. BARTH

129

PhysicalProperty Data

Example: A v a i l a b i l i t y

o f Data

-> s mpol/fa L6 156 MPOL/FA Example: A v a i l a b i l i t y -> s

o f Handbook

Data

handbook/mpol.kw

L7

64 HANDBOOK/MPOL.KW

-> d mpol

Downloaded by CORNELL UNIV on August 6, 2016 | http://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch008

L7

ANSWER 1 OF 64

Molar P o l a r i z a t i o n : Value (MPOL) (cm**3/mol)

R e f . Note 3 3

1, 2 1, 4

Reference ( s ) : 3. E b e r t ,

E i s e n s c h i t z , v . H a r t e l , Ph.Ch. 1,110

Note(s): 1. Handbook Data 2. M o l e k u l a r p o l a r i s a t i o n im f e s t e n und im f l u e s s i g e n Zustand. 4. M o l e k u l a r p o l a r i s a t i o n von Loesungen i n B e n z o l und Tetrachlorkohlenstoff.

Figure 10. Search f o r the a v a i l a b i l i t y o f p h y s i c a l properties

additional f i e l d s supporting, e s p e c i a l l y , t h e numeric e n t i t i e s . I n a d d i t i o n , t h e Messenger s o f t w a r e has been e n h a n c e d w i t h new c a p a b i l i t i e s t o p r o v i d e a n u m e r i c d a t a s e r v i c e w i t h i n S T N . Among t h e s e c a p a b i l i t i e s a r e a numeric range o v e r l a p d e t e c t i o n , a concept for the s p e c i f i c a t i o n o f tolerances, a unit conversion feature, and a p o s s i b i l i t y t o p e r f o r m searches for missing values. As shown i n this paper, there a r e many p o s s i b i l i t i e s t o perform r a t h e r s o p h i s t i c a t e d searches of physical property information i n the Beilstein database. Together w i t h the features f o r t h e r e t r i e v a l of substance and r e a c t i o n information, the database supports t h e customer with different complementary c h o i c e s o f d a t a a c c e s s . W i t h t h i s r e s p e c t , i t c a n be s t a t e d t h a t t h e B e i l s t e i n d a t a b a s e may p r o v i d e a n s w e r s t o almost any a r e a o f o r g a n i c c h e m i s t r y . Acknowledgment The f u n d i n g o f t h i s w o r k b y t h e F e d e r a l German M i n i s t r y of Research and Technology i s g r e a t l y acknowledged. RECEIVED May 17, 1990

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.