Machine Learning List Vol. 1 No. 14

Published in
· 13 Dec 2023
 
		 Machine Learning List: Vol. 1 No. 14 
			 Sunday,  Dec 3, 1989 

Contents: 
        Book announcement: Connectionistic Problem Solving   
        IJCAI Review:representation change in induction 
        CFP: First Japanese Knowledge Acquisition Workshop (JKAW90)	 
		 
The Machine Learning List is moderated.  Contributions should be relevant to 
the scientific study of machine learning. Mail contributions to ml@ics.uci.edu. 
Mail requests to be added or deleted to ml-request@ics.uci.edu.  Back issues 
of Volume 1 may be FTP'd from /usr2/spool/ftp/pub/ml-list/V1/<N>  or N.Z where 
N is the number of the issue; Host ics.uci.edu; Userid & password: anonymous 

- ---------------------------------------------------------------------- 
Subject:  Book announcement: Connectionistic Problem Solving   
From:    Steve Hampson <hampson@ICS.UCI.EDU> 
Date:    Tue, 28 Nov 89 13:36:25 -0800  

Connectionistic Problem Solving   
Steven Hampson, UC Irvine 
Birkhauser: Boston 
ISBN 0-8176-3450-9 

is now available from: 
Birkhauser Boston, Inc. Order Dept. 
c/o Springer Verlag 
44 Hartz Way 
Secaucus, NJ  07094 
1-800-777-4643 

This book develops a connectionistic model of adaptive problem solving. In 
the process, various computational issues are identified that are 
susceptible to formal analysis and that can be related directly to the 
biological structures and processes involved in biological problem solving. 
Abstract problem solving is a well-defined domain for the formal 
investigation of intelligent behavior, and maze learning, used here as a 
concrete example of problem solving, is a traditional tool for 
psychologists and neuroscientists studying animal behavior. Consequently, a 
substantial body of physiological and behavioral data is available to 
motivate the construction of computational models. Conversely, functional 
issues arising in model building help organize these often disparate 
biological results.  At the most general level, the book addresses issues 
concerning the efficient learning and representation of categories, and the 
assembly of action sequences to achieve specific goals.  Various 
approaches, often with complementary characteristics, are considered.  An 
extensive bibliography, ranging from the work of Tolman to recent 
connectionist theory, provides access to relevant work in the 
neural-modelling and animal learning-theory literature. 
- ---------------------------------------------------------------------- 
Subject: IJCAI Review: representation change in induction 
Date: Sat, 02 Dec 89 17:09:45 -0800 
From: "David W. Aha" <aha@ICS.UCI.EDU> 

Long ago, I volunteered to provide brief reviews of the four papers published 
at IJCAI 1989 that were presented in the machine learning/knowledge 
acquisition session on representation change in induction.  Here are outlines 
and brief comments on these articles.  While I hope they are relatively 
accurate, I invite the authors to respond with corrections and comments. 

1. Learning DNF by Decision Trees 
   Giulia Pagallo, UC Santa Cruz 

   This paper introduces the FRINGE algorithm for learning decision trees. It 
   adaptively enlarges the given attribute set with higher-order attributes.  
   The purpose is to solve the _replication problem_, which concerns having  
   tests that an instance satisfies a term in multiple places in the d-tree. 
   FRINGE builds the d-tree, analyzes it to find candidates for higher-order 
   features, re-describes the training instances in terms of the extended 
   feature set, and builds another tree from them.  This process repeats 
   until no new candidate features are found or until a maximum (parameterized) 
   number of features have been created.  Each added feature is the Boolean 
   conjunct of the final two feature/value pairs in the path to a positively- 
   classified leaf.  Disjunctions arise through combinations of conjunctions 
   involving negated features.   

   FRINGE and a generic d-tree algorithm were applied to 9 domains whose 
   target concepts are describable in DNF. The results for final classification 
   accuracy and tree size (number of internal nodes) are impressive: FRINGE  
   outperformed the other algorithm in each domain.  They were also applied to 
   one domain where the training set's noise level (once for all-attributes and 
   once for the class) was the independent variable.  FRINGE's performance  
   wasn't as impressive in this demonstration, but it still significantly  
   outperformed the other algorithm.  Pagallo noted similar comparative results 
   in applications to one of Quinlan's thyroid disease domains, the noisy LED 
   Display domain, and to Schlimmer's mushroom domain, which suggests that  
   FRINGE performs well in applications with nominal and continuous-valued 
   features. 

   Notes and Questions: 
     1. Pagallo notes that the approach is applicable to learning concepts, 
        possibly noisy, describable by "small" DNF expressions.  It won't  
        improve decision trees otherwise. 
     2. Can a variant of FRINGE work incrementally?  What are the tradeoffs? 
     3. How does FRINGE process the missing attribute values in the thyroid 
        diagnosis database? 
     4. While FRINGE's results are given for classification accuracy and 
        d-tree size, no results are given for some other interesting 
        performance measures (e.g., number of iterations, number of constructed 
        features). 
     5. The scales in Figure 3 appear to have been mis-identified and switched, 
        as were the values in the lowest pair of leaf nodes in Figure 2. 

2. Constructive Induction on Decision Trees 
   Christopher J. Matheus and Larry Rendell, University of Illinois 

   This paper also presents a d-tree algorithm, CITRE, that constructs  
   features.  The authors also give a four-part framework for feature  
   construction.  Given a set of constructor operators and an (initial) set of 
   features, the framework and some typical instantiations are: 

      1. Detection of when feature construction should be performed 
         A. none 
         B. analysis of the initial data set 
         C. analysis of the concept description 
      2. Selection of constructors (operator/feature tuples): ordering biases 
         A. algorithm biases 
         B. training set biases 
         C. concept-based biases 
         D. domain-based biases 
      3. Generalization of the constructors   
         A. none 
         B. constructive compilation (deductive) 
         C. constructive induction 
      4. Evaluation of the new features 
         A. none 
         B. request evaluation from teacher 
         C. automatically order the features and keep the best ones 
   
   The authors survey the tradeoffs involved with each option for each of 
   the four framework components.  Their algorithm, CITRE, is similar to 
   FRINGE in that it iterates through the two step process of constructing 
   a d-tree and using it to suggest new features.  However, CITRE differs in 
   its use of domain knowledge, feature generalization, and feature  
   construction.  CITRE is defined as follows: 

      1. Detection: whenever disjuncts are found 
      2. Selection:  
           A. Propose conjoining all possible pairs of attribute-values 
              that lie along a path from the root to a positive leaf 
           B. Use simple domain knowledge to constrain the selection 
      3. Generalization: changing constants to variables 
      4. Evaluation: a user-defined maximum number of features are maintained, 
         where the ordering is determined from the information-theoretic gain 
         from using this feature to partition the _entire_ training set 

   CITRE stops iterating when no new features are constructed. 

   CITRE was applied to a tic-tac-toe domain, where the training instances  
   are examples of a winning position for "x" or "o".  The generalization 
   and selection strategies were shown to both significantly improve 
   the accuracy of CITRE, but only the selection strategy resulted with 
   significantly reduced number of terms considered at each iteration 
   (down from an average of 367 to 27).   There was no significant difference 
   between using only the selection strategy and using both strategies 
   for any of the performance measures (i.e., accuracy, number of constructors 
   considered, and number of internal nodes).  The authors also note that 
   CITRE improved classification accuracies in some experiments with learning 
   random, Boolean functions. 
  
   Notes and Questions: 
      1. CITRE's choice for evaluating the proposed features may differ if 
         their information gain was evaluated on subsets of the training data. 
         How can this be performed efficiently? 
      2. CITRE's results were detailed only on the tic-tac-toe domain. How will 
         its components change for other domains?  Will its choice for  
         selection result with far too many proposed new features?  How will  
         the maximum number of retained features vary with regard to the  
         characteristics of the application domain? 
      3. Is most of the power of CITRE in its selection component?  While the 
         generalization strategy resulted with improved accuracies, it did  
         not improve CITRE's efficiency.  What domain characteristics are 
         required for the benefits of constructor generalization to be  
         realized? 
      4. Optimal behavior on the application domain was not described.  How 
         close is CITRE's behavior to optimal?  How well do comparable 
         algorithms perform in this domain? 
      5. CITRE may be too inefficient (propose too many new constructors) when 
         the selection strategy is weak, which will occur whenever the domain 
         knowledge is insufficient.  Perhaps the criteria for proposing 
         new features should be more conservative in these situations.  
      6. Were the fringe feature conjunctions more useful than other interior 
         conjunctions?  Under what conditions will this occur?  When will 
         CITRE require fewer iterations than FRINGE? How many iterations did 
         CITRE require when applied to the tic-tac-toe domain? 

3. Principled Constructive Induction 
   Pankaj Mehra, Larry Rendell, and Benjamin W. Wah, University of Illinois 

   This paper suggests using Watanabe's notion of inverted spaces to guide 
   the constructive induction process.  In an inverted space, the examples 
   define the axes of the features define points in the space.  The authors 
   attempt to explain how a hyper-cube in the inverted space can be used 
   to determine whether features are linear separators in the (original) 
   feature space.  If feature f lies outside the hyper-cube in the inverted 
   space defined by the positive examples and inside the hyper-cube in 
   the inverted space defined by the negative examples, then f is a  
   separator of these examples.   

   One problem with this approach is that complex computations can result 
   when large numbers of examples yield inverted spaces with high  
   dimensionality.  The authors are confident that methods for selecting 
   representative examples will sufficiently reduce their number to simplify 
   these computations. 

   In an inverted space, an equiparameter line that equates all the examples' 
   values yields maximally uniform values in that set of examples.  The goal 
   of this work is to define an algorithm that constructs features which (1) 
   lie close to the equiparameter lines in both of the inverted spaces (i.e., 
   the one for positive and the one for negative instances) and (2) satisfy 
   the inclusion/exclusion test for the separating hyper-cube.  A brief  
   example is given in which a feature is constructed for the Xor concept. 

   Notes and Questions: 
     1. No empirical evidence is given suggesting that their approach will 
        efficiently learn useful features. 
     2. This is an exciting approach, derived from the pattern recognition 
        literature, that proposes to tackle several problems simultaneously. 
        If it proves successful, it may lead to a highly sophisticated and 
        mathematically insightful means for inducing features. 
     3. Why does the hyper-rectangle always point downwards?  What are the 
        reasons why the hyper-rectangle's vertex is fixed at 
        (Theta,Theta,...,Theta)? 
     4. The authors pointed out one mistake in their paper: What was said in 
        the paper (at different points) is that (i) we are interested in 
        features that lie at small angles of the equiparameter line; and,  
        (ii) we are interested in features that lie inside the hyper-cube in 1 
        space and outside it in another. 

	       Consider the following picture: 
			       | 
 		    IV	       |	II 
			       | 
	       --------------(theta,theta)------ 
			       | 
			       | 
			       |       III 
		     I 	       | 
			       | 

        Features that lie in regions of type I in one inverted space and in  
        regions of type II in the other GUARANTEE separability. In general,  
        nothing can be said about features that lie inside/outside the cube in 
        one space and outside/inside the corresponding cube in the other space 
        (for example, features in regions III and IV).  (Interested readers can 
        contact mehra@aquinas.csl.uiuc.edu for further details.) 
     5. Figure 5: E- space is on the left and E+ is on the right. 

4. Improving Efficiency by Learning Intermediate Concepts 
   James Wogulis and Pat Langley, UC Irvine 

   This paper introduces RINCON (Retaining INtermediate CONcepts), which 
   deductively extends domain theories from examples with the goal of 
   maximizing classification efficiency.  Their thesis is that 
   efficiency benefits result from saving intermediate concepts.  All 
   instances and concepts are represented as logical implications. 
   These are partially ordered by generality in a hierarchy which allows 
   nodes to have multiple parents.  RINCON incrementally updates this 
   hierarchy when it processes an instance.  When an instance does not 
   exactly match some node, a search is made for the least upper bounds 
   (LUB) of the instance and the set of most general concepts that do 
   not match it.  A LUB is chosen (from this set of LUBs) that can be 
   used to re-express the most number of concepts.  The chosen LUB and 
   the generalized instance are re-expressed using that LUB are then 
   added to the hierarchy.  All concepts/nodes in the hierarchy that can 
   be simplified by using the newly found LUB are re-expressed in terms of it. 

   The authors summarize two empirical studies, where the dependent variable 
   is the amount of work required to match or reject an instance, as measured 
   by a "join" (which occurs when two lists of bindings are combined).  They 
   show that the flat domain theory, in which no learning takes place, is less 
   efficient at classifying instances in two applications than is the theory 
   which results from learning intermediate concepts, both for processed and 
   new instances.   

   RINCON's behavior can be viewed as an inverse of EBL.  While EBL 
   yields operational concept descriptions given domain theories, RINCON 
   yields domain theories given operational concept descriptions 
   (instances).  RINCON is a purely deductive, incremental learning 
   algorithm which works with relational data.  It updates the nodes in 
   its domain theory as new intermediate concepts are found. 

   Notes and Questions: 
     1. It's odd that new instances are, on the average, classified more 
        efficiently than processed instances.  This seems to be what Figure 2 
        shows.  However, this interpretation is wrong.  The lines for the 
        unseen instances are simply the time required to FAIL to match.  
        (Recall that, since RINCON doesn't infer beyond its data, it will 
        match on only previously processed instances.) 
     2. Is there a point at which too many intermediate concepts result in 
        a decrease in efficiency? 
     3. Under what conditions can RINCON be expected to result with increased 
        matching efficiency?   
     4. An evaluation function is needed to judge the utility of the  
        intermediate concepts to determine which ones are worth saving. 
     5. How should RINCON learn terms other than conjuncts? 
     6. The authors plan to use RINCON in a combined empirical/analytic  
        learning algorithm.  How will inductions be performed?  How will 
        the resulting concepts be evaluated? 
     7. The concept hierarchy's internal nodes are always conjunctions.   
        However, the highest-level nodes can be disjunctive.  The authors plan 
        to develop extensions of RINCON to allow internals to be disjunctive. 
         
- ---------------------------------------------------------------------- 
From: motoda@harl86.harl.hitachi.co.jp (Motoda Hiroshi) 
Newsgroups: news.announce.conferences 
Subject: CFP: First Japanese Knowledge Acquisition Workshop (JKAW90) 
Date: 30 Nov 89 01:00:38 GMT 

 
                            Call for Participation 

  FIRST JAPANESE KNOWLEDGE ACQUISITION FOR KNOWLEDGE-BASED SYSTEMS 
  WORKSHOP 

                    Co-Sponsored by 
          Advanced Research Laboratory, Hitachi, Ltd. 
          Kansai Institute of Information System 

                    In Cooperation with 
          American Association for Artificial Intelligence 
          Information Processing Society of Japan 
          Japanese Society for Artificial Intelligence 
          Japan Society for Software Science and Technology 
          The Institute of Electronics, Information and Communication 
          Engineers 

          Kyoto International Conference Hall (Kyoto) 
                      October 25 - 26, 1990 

          Advanced Research Laboratory, Hitachi Ltd. (Tokyo) 
                      October 29 - 31, 1990 

A problem in the process of building knowledge-based systems is 
acquiring and modeling appropriate problem-solving knowledge. The 
objective of this workshop is to assemble theoreticians and 
practitioners of AI who recognize the need for developing methods and 
systems that assist the knowledge acquisition process. 

The workshop will be in two parts: a two-day open meeting in Kyoto and 
three-day closed workshop in Tokyo.  To encourage vigorous interaction 
and exchange of ideas the closed workshop will be kept small - about 
40 participants, one author for each paper accepted.  Some papers will 
be presented at the open meeting and the remainder in the closed 
workshop. There will be Tutorial and invited talk sessions in the open 
workshop. 

 
Papers are invited for consideration in all aspects of knowledge 
acquisition for knowledge-based systems, including (but not restricted 
to): 

o Transfer/modeling of expertise - systems that obtain and model 
  knowledge from experts. 
o Transfer/modeling of expertise - manual knowledge acquisition 
  methods and techniques. 
o Apprenticeship, explanation-based, and other learning systems; 
  integration of such systems with other knowledge acquisitIon 
  techniques. 
o Methods for capturing design knowledge and requirements 
o Issues in cognition and expertise that affect the knowledge 
  acquisition process. 
o Extracting and modeling of knowledge from text. 
o Eliciting and modeling knowledge from multiple sources. 
o Integration of knowledge acquisition techniques within a single 
  system; integration of knowledge acquisition systems with other 
  systems (hypermedia, database management systems, simulators, 
  spreadsheets...). 
o Knowledge acquisition methodology and training. 
o Validation of knowledge acquisition techniques; the role of 
  knowledge acquisition techniques in validating knowledge-based 
  systems. 

Five copies of a draft paper (up to 20 pages) should be sent to 
Hiroshi Motoda before February 28th, 1990. Acceptance notices will be 
mailed by May 30th. Camera-ready copies should be returned before 
August 15th. A preprint volume will be distributed at the workshop. 

There will be travel-and-expense awards for the best paper submitted 
by students from overseas countries to cover a part of their travel 
expenses. Please note if the paper should be considered for this 
award. 

 
Workshop Co-chairmen: 

John Boose                         Brian Gaines 
Advanced Technology Center         Department of Computer Science 
Boeing Computer Services           University of Calgary 
john@boeing.com                    gaines@calgary.cdn 

Hiroshi Motoda                     Riichiro Mizoguchi 
Advanced Research Laboratory       Institute of Scientific and 
Hitachi, Ltd.                      Industrial Research 
Kokubunji, Tokyo 185, Japan        Osaka University 
motoda@harl.hitachi.co.jp          miz@ei.sanken.osaka-u.ac.jp 

 
International Program Committee 

Tom Addis, University of Reading, UK 
Guy Boy, Centre d'Etudes et de Recherches de Toulouse, France & NASA AMES 
Jeffrey Bradshaw, Boeing Computer Services 
B. Chandrasekaran, Ohio State University 
William Clancey, Institute for Research on Learning, CA 
Jean-Gabriel Ganascia, University Pierre et Marie Curie, France 
Thomas Gruber, Stanford University 
Koichi Hori, University of Tokyo 
Nancy Johnson, Brunel University, UK 
Georg Klinker, Digital Equipment Corp. 
Shigenobu Kobayashi, Tokyo Institute of Technology 
Yves Kodratoff, CNRS, France, & George Mason University 
Marc Linster, GMD. Bonn, Germany 
John McDermott, Digital Equipment Corporation 
Ryszard Michalski, George Mason University 
Katharina Morik, GMD, Bonn, Germany 
Toyoaki Nishida, Kyoto University 
Mark Musen, Stanford University 
Bruce Porter, University of Texas at Austin 
Ross Quinlan, Sydney University, Australia 
Alain Rappaport, Neuron Data, USA 
Mildred Shaw, University of Calgary 
Hirokazu Taki, Institute for New Generation Computer Technology 
Masanobu Watanabe, NEC Corporation 
Bob Wielinga, University of Amsterdam, Holland 
- ---------------------------------------------------------------------- 
END of ML-LIST 1.14
Machine Learning List Vol. 1 No. 14

Share this article

Let's discover also

Machine Learning List Vol. 6 No. 02

Machine Learning List Vol. 6 No. 09

Machine Learning List Vol. 6 No. 17

Machine Learning List Vol. 1 No. 12

Machine Learning List Vol. 3 No. 10

Machine Learning List Vol. 4 No. 20

Machine Learning List Vol. 6 No. 10

Machine Learning List Vol. 6 No. 18

Machine Learning List Vol. 6 No. 27

Machine Learning List Vol. 5 No. 08

Recent Articles

Hey bro, can you solve this math?

The alien? Once there were, now they are all extinct!

Do the Men in Black exist? Is anyone monitoring alien activity on our planet?

In our DNA there are traces of a completely unknown hominid species

The enigma of the viking sword Ulfberht: forged with future technologies?

Oldest lunar calendar in the world is 10 thousand years old

Makhunik: the Iranian city of the world's smallest humans?

Discovered Nile's branch that leaded to the Pyramid's construction site

The Merovingians and the Holy Grail

Does human DNA contain a gene from the gods?

Recent Comments