------------------------------------------------------------------------

                      BIOWULF'S MATHEMATICAL TOOLS:
                      -----------------------------

                 NEW BREAKTHROUGHS IN MACHINE LEARNING
                 -------------------------------------

                              screenplay by

                            Alan B. Scrivener

                                                             19 July 2001
                                                             second draft

------------------------------------------------------------------------

S-___. TITLE CARD

   BIOWULF LOGO

S-___. TITLE CARD YELLOW LETTERS ON WATERFALL BACKGROUND: BIOwulf's Mathematical Tools: New Breakthroughs in Machine Learning NARRATOR BIOwulf's mathematical tools: new breakthroughs in machine learning S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) Hi, I'm Mollie Tobin, and in the next thirty minutes I want to briefly explain some of the mathematical tools used by BIOwulf for machine learning, especially the technique known as "Support Vector Machines," or "SVM" for short. This is a rather obscure phrase, becasue it's hard to understand the meaning without a lot of explanation. To illustrate this, we asked a 7 year old child to draw what she imagined when she heard the words, "Support Vector Machine." We did have to explain that a vector is sort of like an arrow. S-___. CLOSE UP: NARRATOR HOLDING DRAWING NARRATOR (CONTINUING) Here you can see there is a machine, held up by this support, which shoots arrows. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) So first we may have to wipe away some miconceptions before we present some replacement concepts. We'll get to that in a minte. First let me tell you what we've assumed about you. We're assuming that you know the concepts of algebra; maybe you took calculus in college and then forgot it all, but you can still look at an equation like this... S-003. CARD YELLOW LETTERS ON BLUE BACKGROUND: a x2 + b x + c = 0 NARRATOR (CONTINUING) ... and you realize that x is the variable and a, b and c are constants. You remember that x superscript 2 means x squared, or x time x, and you remember that b right next to x there means b times x. S-004. CLOSE-UP: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) Extra credit if you recognize it as "the quadratic equation." I'd also like to apologize in advance to those of you who are mathematicians, quantitative scientists or engineers -- for you this information may seem overly basic, at least at first. We want to make sure we bring everyone along. But hang in there and I bet you'll learn something yet! And just so you are clear on the scope of this video, this is not to inform you of BIOwulf's business plans, key people, product offerings or marketing message. That information is communicated in other ways. This video will cover the following topics... S-005. CARD: YELLOW LETTERS ON BLUE BACKGROUND: Why this is important. machine learning generalization overfitting state space combinatorial explosion correlation orthogonal linear weights recursion recursive feature elimination support vectors kernels Does it work? NARRATOR (CONTINUING) First we'll get a quick reminder why this stuff is important, and then we'll review the he concepts of: machine learning, generalization, overfitting, a state space, a combinatorial explosion, what is correlation, what does orthogonal mean, linear functions, weights, recursion, recursive feature elimination, support vectors, and kernels, followed by a brief discussion of perhaps the most important question: "Does it work?" S-006. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) Now we're not going to explain how you can use these mathematical tools yourself -- we have some papers you should read if you want to get into that level. We're just going to explain in broad outlines how they work. This quote from Richard Feynman's brilliant little book on quantum electrodynamics... (HOLDS UP 'QED') -- QED is the title -- explains well what we are trying to do: (READING FROM THE BOOK) "How am I going to explain to you things I don't explain to my students until they are third year graduate students? Let me explain it by analogy. The Maya Indians were very interested in the rising and setting of Venus as a morning star and as an evening star -- they were very interested in when it would appear. After some years of observation they noted that five cycles of Venus were very nearly equal to eight of their years of 365 days. To make calculations, the Maya had invented a system of bars and dots to represent their numbers -- including zero -- and had rules by which to calculate and predict not only the risings and settings of Venus, but of other celestial phenomena, such as Lunar eclipses. "In those days, only a few Maya priests could do such elaborate calculations. Now, suppose we were to ask one of them how to do just one step in the predicting when Venus will next rise as a morning star -- subtract two numbers. How would the priest explain what subtraction is? "He could either teach us the numbers represented by the bars and dots and the rules for subtracting them, or he could tell us what he was really doing: 'Suppose we want to subtract 236 from 584. First count out 584 beans and put them in a pot. The take out 236 beans and put them to one side. Finally, count the beans left in the pot. That number is the result of subtracting 236 from 584.' "You might say, 'What tedium -- counting beans, putting them in, taking them out -- what a job!' "To which the priest would reply, 'That's why we have the rules for the bars and dots. The rules are tricky, but they are a much more efficient way of getting the answer than by counting beans. The important thing is, it makes no difference as far as the answer is concerned: we can predict the appearance of Venus by counting beans (which is slow, but easy to understand) or by using tricky rules (which is much faster, but you must spend years in school to learn them).'" (LOWERS BOOK) Got it? Okay, let's get started. S-007. CARD: YELLOW LETTERS ON BLUE BACKGROUND: Why this is important. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR I don't mean to be melodramamtic, but what we're trying to do here is cure cancer, among other things. The point of all this esoteric math is to improve medical diagnosis. S-___. LONG SHOT: NARRATOR AT MISSION NARRATOR (CONTINUING) I'm here at Mission San Juan Capistrano, you know, where the swallows return every year. S-___. MEDIUM SHOT: NARRATOR IN ALCOVE NARRATOR (CONTINUING) This little alcove is all that's left of the oldest building in California; today it is used as the shrine to Saint Peregrine, the Patron Saint of Cancer Suffers. S-___. CLOSE UP: CANDLES BURNING NARRATOR (CONTINUING) Over half a million people die of cancer every year. S-___. lONG SHOT: PEOPLE AT SPORTS STADIUM NARRATOR That's almost ten times the number of people this stadium holds. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR Did you know the state of Georgia has an initiative to cure cancer? Thye'd like to go down in history as the state it happened in. Some thing to be proud of, to tell the grand kids. S-___. CARD: FIGURE F-_

NARRATOR (CONTINUING) Biowulf is working with the state of Georgia's Department of Community Health, which gives them access to cancer diagnosis data for over 2 million people, in order to improve cancer diagnosis. That's why this is important. Okay, let's get to our first concept. S-___. CARD: YELLOW LETTERS ON BLUE BACKGROUND: machine learning S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR I want to make sure you aren't confused by the word 'machine' here. What we mean is a algorithm, or precise technique, for solving a problem. The mathematical literature is full of abstract machines, like the Markhov Machine and the Turing Machine, which are just mathematical models. There usually isn't a special physical machine built, it's just simulated by programming a digital computer. In the case of machine learning, we're talking about writing a computer program that can learn rules by studying example data. Lets really spell this out, because this is the problem definition. I'm going to give you two made-up examples, just because they're easy to talk about, and then a third example that is the kind of work BIOwulf actually does with machine learning. First, imagine that you've discovered some unusual kind of rat nobody knows anything about, and you want to know what to feed it. Fortunately you have some recent dietary experiment results for these rats. Some students had a bunch of rats and for each one they randomly chose a particular canned food from a grocery store and fed it only that food daily for a month. (HOLDS UP A CAN WITH NUTRITIONAL CONTENT LABEL) They added up the nutritional contents on the sides of the cans... S-___. CLOSE UP: NUTRITIONAL LABEL NARRATOR (CONTINUING) ...you know, the amount of vitamins, minerals, proteins, fat and so on... S-___. CARD: TABLE T-1

rat # Carb. Fat Cholest. Sodium Protein Vitamin A Vitamin C Calcium Iron +/-
rat #1 0 g 0.5 g 30 mg 200 mg 13 g 0% 0% 0% 2% -
rat #2 20 g 0 g 0 mg 10 mg 1 g 0% 20% 0% 2% -
rat #3 25 g 2 g 0 mg 45 mg 0 g 0% 0% 0% 2% -
rat #4 4 g 0 g 0 mg 0 mg 1 g 4% 8% 2% 4% +
% = per cent daily human requirements


         NARRATOR (CONTINUING)
      .. and for each rat, they made a list of all of the nutrients
      it received, and a notation as to whether it lived or died.
      You can see on the list here, each row is a rat.  The leftmost
      column gives the rat number, the columns in the middle are the
      nutrients, and the last column is a plus if the rat lived and
      a minus if it died.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR
      The challenge then is to figure out from the list -- and only
      from the list -- what is the best diet to feed the rats.
      And to be able to take a proposed diet and predict whether
      a rat will live or die on it.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      We want to produce the formula automatically from the numbers
      in the list.  In other words, we need a program that gets
      trained by the data and then can classify new data.

      The second example is not made up, but represents actual
      research done by BIOwulf's mathematicians.

      We were fortunate to receive 62 frozen tissue samples from
      ____ hospital that were biopsies of patients with suspected
      colon cancer.  Along with the samples was a file on each
      patient covering 30 years of follow-up testing, so that it
      was possible for each sample to know the "right answer" in
      advance -- whether the patient got colon cancer or not.

      With modern genetic analysis techniques we were able to
      subject each sample to 2000 gene probes, to determine for
      each of 2000 genes whether that gene was silent, active or
      hyperactive in the sample.  In each case this was turned
      into a number from zero to one.

S-___. CARD: TABLE T-3

patient # income gene #1 gene #2 gene #3 gene #4 *** +/-
patient #1 0.56 0.45 0.55 0.21 0.98 *** -
patient #2 0.66 0.89 0.47 0.82 0.38 *** +
patient #3 0.01 0.09 0.66 0.63 0.54 *** +


         NARRATOR (CONTINUING)
      You can see that for each patient we have a row of data on
      which genes were expressed, and by how much, and in the rightmost
      column a plus sign if the patient got colon cancer and a minus
      if they didn't.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      Once again the goal is to produce a formula -- what we
      call the Decision Function -- to sort patients correctly
      into  those who will and won't get the cancer, using only
      the information on the list.

      Okay, a third example: this one is also made up, but it's
      designed to help you visualize what's going on more
      easily.

      This one is really simple.  There's only two columns in the chart.
      In fact, I'm not even going to show you a chart for this one.
      You're sick of charts anyway, right?

      Imagine there's this landscape someplace that youy can't see.
      Some mountains, valleys, water, and you're trying to avoid the
      water, but you have no idea what the shape of the terrain is.
      So you drop probes.  Each probe tells you its latitude and
      longitude.  If we assume the earth is flat that's just 2
      dimensional grid coordinates.  And we also find out if the
      probe got wet or not.  We get -- guess what? -- a plus if
      it's dry and a minus if it's wet.

S-___. CARD: FIGURE F-__

NARRATOR (CONTINUING) So we draw a map of these plusses and minuses. Essentially we're trying to guess the terrain from the pluses and minuses. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) Let's dig into this metaphor a little more, and take a field trip. S-___. LONG SHOT: MISSION TRAILS SIGN NARRATOR (CONTINUING) We're here at a canyon in San Diego called Misssion Gorge. S-___. MEDIUM SHOT: NARRATOR BEFORE GRANITE MOUNTAINS NARRATOR (CONTINUING) All around me you can see the striking granite geography. Some of those same mission swallows nest in these walls, by the way. S-___. MEDIUM SHOT: NARRATOR AT STREAM SIDE NARRATOR (CONTINUING) And isn't this a magical spot, down by the river? Father Serra, founder of the California Missions, surely walked along these banks, and before that Kumayaay natives used grinding holes here to gring acorn flour. But I'm letting myself get distracted. Here we can observe water flowing, seeking its own level. To make this metaphor work, we'll have to assume that water flows extremely fast, seeking its own level instantly. S-___. CLOSE UP: NARRATOR DIPPING HANDS IN STREAM NARRATOR (CONTINUING) So we wouldn't see waterfalls like this. All this water would already be out to sea. Imagine if the oceans rose, though, and filled this canyon. No surf, no waves, just a calm sea intersecting the canyon walls. Remember our goal, to determine for any arbitrary point whether it is wet or dry. Of course, the perfect solution would be if we knew the actual elevation at each point. S-___. CLOSE UP OF TOPO LINES ON MAP NARRATOR (CONTINUING) This is the kind of data represented by the contours on a topological map. S-___. ANIMATION A-_ MISSION TRAILS FLY-OVER

NARRATOR (CONTINUING) With elevation data we can program a computer fly-over of the terrain. Here we see a simulation of flying over this same geography. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) Is this strating to make sense? You get this probe data, giving you a pair of coordinates and a plus or minus. You get a whole list of this data. S-___. INSERT: PROBE ON PARACHUTE FALLING NARRATOR (CONTINUING) See, there's a probe dropping in. S-___. MEDIUM SHOT: NARRATOR HOLDING CLIPBOARD NARRATOR (CONTINUING) And here's my list of the probe data. And now I have to predict for any new postiiton, will it be wet or dry? You would be able to give perfect answers if you had the actual terrain data. S-___. ANIMATION A-_ MISSION TRAILS SLICED BY BLUE PLANE

NARRATOR (CONTINUING) Here you see the terrain data in another computer-generated picture, a little less realistic this time, and with a blue plane slicing through it to represent the water level. In this case the alitude can be treated as a confidence value. If the water rose a hundred feet, we'd be confident that something above that level would stay dry. S-___. MEDIUM SHOT: NARRATOR HOLDING CLIPBOARD NARRATOR (CONTINUING) But of course we don't have the terrain data. That's the whole point. All we have is these probe results. I don't have to remind you that the once again the goal is to produce a formula -- what we call the Decision Function -- to sort locations on the mountain correctly into those that are underwater and those which are above water, using only the information on the list. Now you might wonder, why is this so important? Why can't we use additional information? For example, looking at the rat diets again... S-___. CARD: TABLE T-4

rat # Carb. Fat Cholest. Sodium Protein Vitamin A Vitamin C Calcium Iron +/-
rat #1 0 g 0.5 g 30 mg 200 mg 13 g 0% 0% 0% 2% -
rat #2 20 g 0 g 0 mg 10 mg 1 g 0% 20% 0% 2% -
rat #3 25 g 2 g 0 mg 45 mg 0 g 0% 0% 0% 2% -
rat #4 4 g 0 g 0 mg 0 mg 1 g 4% 8% 2% 4% +
% = per cent daily human requirements


         NARRATOR (CONTINUING)
      From this admittedly small sample rat #4 is the only survivor.
      It might help to know that rat #4 was fed canned string beans.
      It is also tempting to make guesses based on human nutritional
      needs, or -- even better -- the needs of other known rats.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      But there is a big problem with this.  You will recall that
      there are 2000 genes represented in the tissue samples.
      For some of the experiments we haven't talked about yet
      there can be up to 22,000 columns of data.

      It was Lenin who said, "Quantity has a quality all its own."
      Human intuition and judgment are very good at operating on
      dozens or maybe even sometimes hundreds of factors, but
      thousands of them overwhelm us.

      So we put all the information we think might be relevant in
      the table.  Age, weight, number of known relatives with
      similar cancers, all could be added to the patient data.
      But then we have to let a computer program do the learning
      and the deciding, because the problem is simply too big for
      the human mind.  One of the things we want the computer to do
      is tell us which columns we can toss out -- which ones aren't
      needed to create the decision function.

S-___. FIGURE F-1

_


      So here in a nutshell is how machine learning is done.
      You have some data like the examples above where you know
      the right answer.  Typically, you break these into two groups,
      a training group and a testing group.  You run all the
      training data through a machine learning program and it
      "learns" by coming up with a formula for classifying the
      data, which we call a decision function. Built into that
      function is the information about which columns n the table
      which to throw out.

      Next you run the testing data through the decision function
      and see if it gets the right answer.  When it all seems to
      be working correctly you can real data through -- in other
      words, data without the pluses and minuses, without the
      right answers supplied.  That's when you have to wait 30
      days and see if the rat dies, or 10 years to see if the
      credit card customer pays their bills, or 30 years to see
      if the patient gets colon cancer.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      The only other complication is that usually you really don't just
      want pluses and minuses, it would be nice to know how sure
      you are, so the decision function usually returns a number.  If
      the number is positive, that's a plus, and if it's negative,
      that's a minus, and the magnitude tells us the confidence.

      So there you have machine learning in a nutshell -- at least
      the problem definition.  The hard work comes in trying to
      design a program which can turn the training examples into the
      decision function, and that's what the rest of this video is
      about.
.
      For the last 50 years or so the most popular traditional approach
      to the machine learning problem has been to use neural networks...

S-___. FIGURE F-2: PHOTO OF NEURONS


         NARRATOR (CONTINUING)
      ... This involves simulating the actual neurons in the human
      nervous system, and this approach has done a pretty good job.
      But as we will see, the "SVM" method does better in many cases.
      To continue the terrain metaphor, neural nets can find hills and
      valleys, but they don't always find the highest hill or the
      deepest valley.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      But before I can explain the SVM method, I need to introduce a
      few more concepts.

S-___. CARD YELLOW LETTERS ON BLUE BACKGROUND:

   generalization

         NARRATOR (CONTINUING)
      Generalization.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      If a learning machine doesn't generalize, it is useless.
      The whole point is to form generalizations that can be used
      to make new decisions.  For that reason it is important to
      find some kind of generlaized portrait of the data boundary,
      the water line if you will.

S-___. CARD YELLOW LETTERS ON BLUE BACKGROUND:

   overfitting

         NARRATOR (CONTINUING)
      Overfitting.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      This is the thing we want to avoid.  It's sort of like overreacting
      to clues, or getting too hung up on details.  When a learning
      machine gets too specific, and thinks only certain exacxt data
      values are in plus territory, overfitting has occured.

S-___. CARD YELLOW LETTERS ON BLUE BACKGROUND:

   state space

         NARRATOR (CONTINUING)
      State space.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      Normally when we use the word space we're talking about the
      space that fills the universe we live in.  We usually call
      it a three-dimensional space, because we can specify a point
      in it with three coordinates.  This GPS...

S-___. CLOSE-UP: NARRATOR HOLDING UP GPS

         NARRATOR (CONTINUING)
      ... a Global Positioning System receiver, tells me my latitude
      north or south, my longitude east or west, and my elevation.
      This completely defines my position in 3D space.

      Sometimes people say time is the fourth dimension, so the GPS
      also tells time, which locates my in the 4th dimension as well.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      But mathematicians like to think about lots of other
      spaces besides physical space.  A state space is an imaginary
      space where each point represents a possible state of a system.

      They have lots of uses, but the way we're going to use them
      here is to represent rows in out data lists.  To start with,
      let's pretend all of our data has only two columns of values
      for each row.  That will allow us to represent each row -- each rat
      or patient or probe -- as a point in a two-dimensional state space.

S-___. CARD: FIGURE F-3

NARRATOR (CONTINUING) Here you can see three points represented, colored red, green and yellow just to tell them apart. So this is a graphical display of three rows of data. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) If we add another column, so we have three columns of data, now each row can be a point in 3D space... S-___. ANIMATION A-1





NARRATOR (CONTINUING) This animation shows the space turning so we can get views from different angles. One of the problems of the 3D view is can be hard to see what's behind. Using time as an added dimension and turning a model can help solve this problem if we only want to see a few points in the state space. But if we want to plot every possible point, we can't see a whole volume of points directly, any more that we can see inside a stack of pingpong balls.. A more fundamental problem is that -- remember -- we have up to 22,000 dimensions in our data. We're already hitting some problems with 3 dimensions, and we've run out. Clearly we can't directly visualize higher-dimensional state spaces. But there are some tricks we can do, as we'll see later. Still another problem is that we want to be able to display our decision function, the output of the Support Vector Machine method. It's just a one-dimensional function, but that does use a dimension. So the most common approach is like the one in this figure... S-___. CARD: FIGURE F-4

NARRATOR (CONTINUING) ... two dimensions are used for the state space and one is used for the decision function. Here the horizontal dimensions represent the state space, like latitude and longitude on a map, and the third dimension, the altitude, is the decision function. The best choice is the lowest in this representation, so we're looking for the bottom of the bowl. But there is a problem with seeing the whole state space and decision function at once. Because paper is flat and screens are flat, it is most common to display a 2D state space in a 2D image, with color as the third dimension. S-___. FIGURE F-5


from http://www.agso.gov.au/geophysics/date050397.html


         NARRATOR (CONTINUING)
      That's how the government of Australia produced this image.
      Ranges of altitudes are shown, starting at sea level, as blue,
      light blue, green, yellow, orange, red, violet and gray for
      the highest.

      This is just another way to show terrain data -- or state space data.

      [PAUSE]

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      So let's see some decision function data plotted this way.

S-___. FIGURE F-6




         NARRATOR (CONTINUING)
      Here is a decision function for some 2D data, where
      dark blue is for the lowest negative values and dark
      red is for the highest positive values.  The black
      line is where the decision function is zero, and so the
      data is "right on the line" between +, live rat or good
      credit, and -, dead rat or bad credit.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      Remember, we're trying to show some analogies here by
      boiling 22,000 dimensions of data down to 2, and then
      using the third for the decision function.  And I want to
      emphasize that we do this not because it's a great ideas,
      but because we don't know of a better way to display
      these fundamental concepts.  If we knew how to display
      the 22,00 dimensions without boiling anything down we'd
      certainly prefer it.

      One thing this exercise does illustrate is the
      importance of reducing dimensions if you can. 
      Many techniques in machine learning, including SVMs,
      begin by attempting to eliminate as many dimensions
      -- in other words as many columns of data -- as possible
      without sacrificing quality in the decision function.
      In fact, recent research by BIOwulf mathematicians
      has shown that coming up with which columns to throw out
      is actually more critical than coming up with an otherwise
      good decision function.  So we'll look at tossing out
      columns first.

      Assume we have some sort of workable program for creating
      decision functions from training data.  The most
      straightforward approach would be to try every possible
      subset of columns and each time, produce the decision
      function using the training data and then test it using
      the testing data.  The find the smallest subset that
      gives right answers.  Simple!

      Unfortunately, it's also impossible in a practical sense,
      because of something called the combinatorial explosion.

S-___. CARD YELLOW LETTERS ON BLUE BACKGROUND:

   combinatorial explosion

         NARRATOR (CONTINUING)
      To understand the combinatorial explosion, let's take a
      momentary diversion into the world of subsets.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)

      First lets consider some small sets and their subsets.  If you
      have a set with only one thing in it, there are two possible
      subsets: the original set with one thing in it and the
      empty set, which you will recall is the name for a set with
      nothing in it.

S-___. ANIMATION A-2



         NARRATOR (CONTINUING)
      This animation is cycling through these two options. 
      One thing, then nothing.  Or if you prefer, nothing,
      then one thing.  The sphere is the one object, and
      we use red to show not being in the set and green to
      show being in the set.  Just like red means stop and
      green means go at a traffic light, here red means "no"
      and green means "yes."

      So far with this example it is no problem to just breeze
      through all the combinations.

S-___. ANIMATION A-3



         NARRATOR (CONTINUING)
      Next consider a set of two objects.  There are four subsets:
      no objects, one object, the other object, or both objects.
      We highlight the connection between them to emphasize that
      they are in a subset together.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      So one object had 2 subsets, while 2 objects had 4 subsets.
      What do you think, will 3 objects have 6 subsets?

S-___. ANIMATION A-4



         NARRATOR (CONTINUING)
      As you can see in this animation, there is one way to take
      no things, three ways to take one thing, three ways to
      take two things, and one way to take three things.
      That adds up to eight subsets.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      Eight?  The sequence goes 2, 4, 8 so far.  Is it doubling
      every time? Let's see.

S-___. ANIMATION A-5



         NARRATOR (CONTINUING)
      Sure enough.  When there are four objects, there is
      one way to have no thing, four ways to have one thing,
      six ways to have two things -- four around the square
      and the two diagonals -- four ways to have three things
      and one way to have four things.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      This is getting boring.  You know why?  Because it's
      taking so long!  It is doubling every time.  That was
      sixteen subsets, and the next one, seven objects, would
      be 32 subsets.  So we're not going to do that one.  At the
      2 frames per second we've been using, it would take over
      a minute.  We promised this would be short.

      So let's jump ahead.  What if we have 30 objects?
      That's not a very big number.  A classroom of 30
      students is a common thing.  How many possible
      clubs could you form out of a classroom of thirty
      students?  If we try to count them up I don't think
      we'll make it.

S-___. ANIMATION A-6

* * *



         NARRATOR (CONTINUING)
      It starts out easy -- there's one way to have a club
      with no students.  Thirty ways to have a club with one student.
      335 ways to have a club with 2 students.  This is going
      to take a while.  Almost 3 minutes just for the clubs of two
      students.  While we're watching the animation run, let me talk
      about how we figure this out.    There's 30 ways to pick the
      first student, and because they can't be in a club twice you
      exclude them from the second choice, and have 29 ways to pick
      the second student.  But that means  we've counted each club twice,
      so the formula is 30 times 29 divided by 2, which is how we get 335.
      As you can imagine, this is quite a pain to do for all the sizes of
      clubs.

      Luckily there's another way.  If it doubles each time, the
      general rule is that the number of subsets is just 2 to the
      power of the number of objects.  Call this N for number of
      objects.  The number of subsets is 2 raised to the N power.
      2 raised to the 30th power is over a billion (that's American
      billion, 1 followed by 9 zeroes).  At this frame rate it will take us
      17 years to watch this animation.  So get comfy!

      Just kidding.

   [ANIMATION IS INTERRUPTED]

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      Now you can see why we call it a combinatorial explosion. 
      Remember, we're dealing with data with up to 22,000 things to
      correlate in subsets.  Long before we get up that high
      we'll be dealing with more susbsets than there are particles
      in the known universe.  People like to use that phrase a lot:
      "more than the number of particles in the known universe." 
      But that's just peanuts to the combinatorial explosion.  These
      numbers are staggeringly bigger than that!

      Allow me to quote from Douglas Adams, in The Hitchhiker's
      Guide to the Galaxy.:

   [HOLDS UP BOOK]

      "Space is big.  Really big.  You just won't believe how
      vastly hugely mind-bogglingly big it is.  I mean, you may
      think it's a long way down the road to the chemist, but
      that's just peanuts to space."

      Well, as you know, space is mostly devoid of matter -- the
      known universe is mostly empty space.  But imagine if the same
      sized known universe was full of particles, as many as it could
      hold.  The number of subsets of 22,000 objects is mind-numbingly
      bigger than the number of particles the universe could hold.

      So my point, and I do have one, is that because of the
      combinatorial explosion there is no way we are going
      to try all possible subsets of what we are calling the
      columns or the dimensions of data.  So, we have to do
      something else.  Something smarter.

      Now that we've returned from our diversion through the world
      of subsets, our next concept is...

S-___. CARD YELLOW LETTERS ON BLUE BACKGROUND:

   correlation

         NARRATOR (CONTINUING)
      ...correlation.  One of the most important questions
      people can ask is, "why?"  We want to establish causality. 
      The first step in that process is to establish a correlation. 

S-___. CARD: FIGURE F-7

NARRATOR (CONTINUING) In this state space plot all of the data points occur pretty much on a line. This shows a strong correlation. Low values of one variable always seem to be associated with low values of the other, and high values of one variable always seem to be associated with high values of the other. What we are hoping to do is to find a small number of columns among the 22,000 that are highly correlated with the decision function, so we can throw away the rest. S-___. CARD YELLOW LETTERS ON BLUE BACKGROUND: orthogonal NARRATOR (CONTINUING) This is related to the idea of being orthogonal. Now don't get confused. The word orthogonal is also used in geometry to mean at right angles. A flagpole is orthogonal to the ground. But here it has a different, more subtle meaning. It means roughly the opposite of correlated. Two variables are orthogonal if they are completely uncorrelated, if they don't affect each other at all. This means there's no causality between them. S-___. CARD: FIGURE F-8

NARRATOR (CONTINUING) If you make a state space plot -- also called a phase plot -- of two orthogonal variables, it will look like a random scatter. These variables were actually plotted by rolling dice, and you can see how uncorrelated they are. S-___. CARD YELLOW LETTERS ON BLUE BACKGROUND: linear NARRATOR (CONTINUING) Our next concept is linear. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) This is a very important concept but it comes with a lot of baggage. The term "non-linear" comes around every 20 years or so and makes a big splash. It got a lot of attention in the late 70s and early 80s with the beginnings of chaos theory: a whole zoo of self-similar fractals, strange attractors and chaotic bifurcations. Unfortunately, an awful lot of people used the terms linear and non-linear without really understanding what they meant. Consider this simple linear equation: y = ax - b S-___. CARD YELLOW LETTERS ON BLUE BACKGROUND: y = ax - b NARRATOR (CONTINUING) Remember that x is the variable and a and b are constants. This is a linear equation because the most complicated term is 'a' times 'x,' a linear relationship. S-___. ANIMATION A-7 [Note: In this animation the narrator's finger should be represented on a dial or knob, or else editing should be used to incorporate live action of the narrator moving a mouse.]

NARRATOR (CONTINUING) The graph of this equation looks like this. You can see that it is a straight line no matter what values 'a' and 'b' have. You may recall the idea of slop and intercept from algebra. 'a' controls the slope of the line, and 'b' controls the Y intercept, the place where the line intersects the vertical axis here; we show a blue sphere at that point. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) This shows how a linear equation of one variable looks. (By the way, in higher mathematics 'x' could also be a vector or a function.) S-___. CARD: FIGURE F-9 [THE SUM OF TWO LINEAR FUNCTIONS]

NARRATOR (CONTINUING) An almost magical property of linear equations is that if you add two or more of them in any linear combination you always get another linear equation. Here we show visually how two linear equations are added to produce a third. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) A lot of work has been done over a long period of time coming up with techniques and tricks for solving linear equations. At this point they represent a sort of double-edged sword. Whenever a problem can be reduced to a linear form it becomes much easier to analyze and solve. But most real-world systems cannot be described by linear equations. So we much chose between ease of solution and accuracy. In fact, saying linear equations are easier to solve is a vast understatement. Consider this chart from the book General System Theory by Ludwig von Bertalanffy, 1968. The chapter is called "The Difficulty of Solving Equations." S-___. TABLE T-5

         NARRATOR (CONTINUING)
      Take a moment to look at this.  Down the left column are
      listed types of equations.  Of the other two columns to
      the right, one is for linear and the other for non-linear
      equations.  [PAUSE]
      
      Notice that all the non-linear equations are very
      difficult or worse.  Even the linear equations are pretty
      bad.  Everything to the right of the thick black line is
      pretty useless.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

      Here's what the author had to say in his own dryly
      humorous style: "[these equations are], if linear,
      tiresome to solve even in the case of a few variables;
      if nonlinear, they are unsolvable except in special cases."

      So, to paraphrase an old song, we will use linear
      equations when we can and non-linear equations when we must.

S-___. CARD YELLOW LETTERS ON BLUE BACKGROUND:

   weights

         NARRATOR (CONTINUING)
      Building on the concept of linearity, our next
      concepts is weights.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      Our best hope is that we can produce a linear decision
      function, with weights for each data dimension.  We
      use the term weights to mean a set of constants, like
      'a' in the last example, only a vector of them.  We call
      this vector 'w' for weights, and it consists of a set of
      constants named w-sub-1 through w-sub-22,000 or whatever.
      So our decision function d becomes this.

S-___. CARD YELLOW LETTERS ON BLUE BACKGROUND:

   vector form

   D = w.x

   expanded

   D = w1 x1 + w2 x2 + w3 x3 +     

         NARRATOR (CONTINUING)
      We use bold in the vector form to show that w and x are vectors.
      The w's and x's in the expanded form are the components of the
      vectors.

S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      We can visualize each x vector as a bunch of strips of cloth
      tied onto a string to become the tail of a kite.

S-___. MEDIUM SHOT: NARRATOR WITH KITE

         NARRATOR (CONTINUING)
      Didi you get that?  One vector is a kite tail, with lots of cloth strips.  Each cloth strip stands for a number.  They're called components of the vector.  One kite, one vector.  And remember, there will be lots of vectors.

      Each kite tail lands, in turn, next to the same set of weight
      values as you see in this animation...

S-___. ANIMATION A-8

   [Note: Again, in this animation the narrator's finger should be
   represented on a dial or knob, or else editing should be
   used to incorporate live action of the narrator moving
   a mouse.]


NARRATOR (CONTINUING) ... each weight -- the red bars -- multiplies by its corresponding x -- the blue bars -- to make a weighted result -- the purple bars -- which are then all added together to make the decision function. The only challenge is to choose the values of the w's correctly. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) Of course, that's the whole game. Assuming you can live with a linear result, and we know how much grief that will save us, the decision function we just gave is the holy grail of machine learning. How do you set the weights? From this point we'll be looking at ways that Biowulf's mathematicians solve this problem. The first important tool is... S-___. CARD YELLOW LETTERS ON BLUE BACKGROUND: recursion NARRATOR (CONTINUING) ... recursion. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) This is actually an idea that is actually easier to understand than it is to explain. S-___. CARD: FIGURE F-10 20 MULE TEAM BORAX BOX

NARRATOR (CONTINUING) There's a famous old box label for Twenty Mule Team Borax that shows a little girl holding a box of Twenty Mule Team Borax, and of course on that box is another little girl holding another box, and so on. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) Or perhaps this old Mad Magazine cover will help. S-___. CARD: FIGURE F-10

NARRATOR (CONTINUING) As you can see, recursion is a sort of magic trick that seems to create something out of nothing. But it actually just involves use of the magic words "and so on." In this case, Alfred E. Neuman is holding a hat that contains a rabbit, and the rabbit is holding a hat that contains a smaller Alfred, and so on. In this case the "and so on" only goes two levels down, but it's easy to see how it could go any number of levels, like on the Borax box. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) So, armed with that concept we move on to... S-___. CARD YELLOW LETTERS ON BLUE BACKGROUND: recursive feature elimination NARRATOR (CONTINUING) ... recursive feature elimination. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) This is one of the methods that Biowulf's mathematicians have refined. Perhaps the best way to explain it is to compare it to that recent hit TV show in the United States -- Survivor. Rather than trying to evaluate each contestant's survival skills alone, they tested them as a team. This may not be the way to identify the best survivor, but it makes it pretty easy to identify the worst. Similarly, it is easier to analyze a set of columns of data in our chart to see which is least useful for deciding the decision function. Then that column, or feature as it is called here, is eliminated and the same procedure is done on the remaining group, and so on. There's those magic words, which make it recursive feature elimination. S-___. CARD YELLOW LETTERS ON BLUE BACKGROUND: support vectors NARRATOR (CONTINUING) Now we get to the fundamental tool used by BIOwulf's mathematicians: support vectors. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) And it's only fair to warn you, this is the weird part. What's weird about it is that it involves higher dimensions. Higher even than the 22,000 we mentioned before. But first, let me clear up some possible confusion about terminology. We've been throwing a lot of specialized mathematical terms around here, sometimes with subtle distinctions. Just like Eskimos have many words for snow, mathematicians have many words for dimensions. S-___. CARD: TABLE T-6

rat # Carb. Fat Cholest. Sodium Protein Vitamin A Vitamin C Calcium Iron +/-
rat #1 0 g 0.5 g 30 mg 200 mg 13 g 0% 0% 0% 2% -
rat #2 20 g 0 g 0 mg 10 mg 1 g 0% 20% 0% 2% -
rat #3 25 g 2 g 0 mg 45 mg 0 g 0% 0% 0% 2% -
rat #4 4 g 0 g 0 mg 0 mg 1 g 4% 8% 2% 4% +
% = per cent daily human requirements


         NARRATOR (CONTINUING)
      Recall our table of which nutrients the rats were fed.  We
      called the rows in this table data points, points in the
      state space, vectors, and sometimes test data or training
      data.  We called the columns in this tables variables,
      dimensions and features.  They're dimensions when we want
      to represent them in a state space, and features when we
      want to eliminate them with recursive feature elimination.

      Now we're going to call them dimensions again as we do some
      math tricks with them.

S-___. ANIMATION A-9

[NOTE: THIS FOOTAGE HAS ALREADY BEEN PRODUCED BY BEOWULF]

NARRATOR (CONTINUING) Here is a 2D state space plot of some data. We'd like to be able to find a linear function decision function that cleanly separates the +'s from the -'s. That would mean finding a line that separates them in this plot, but there isn't one. Now you can see as the animation rotates to show a third dimension, that in a 3D state space plot there is a linear decision function that cleanly separates the space, which means in 3D that a plane can separate them, and you see that one can. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) But what if 3 dimensions doesn't cut it? Well, we can just add more. The problem is, as we've mentioned, we can't show you what that looks like. The best we can do is use some toothpicks and some miniature marshmallows to evoke the idea. Here I have one marshmallow. It stands for a one dimensional point. It has -- humor me on this -- no height, width or depth. I move the point in some direction and trace out a line segment. We can build that with a toothpick and another marshmallow. [CUT] S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR IS HOLDING MODEL OF LINE SEGMENT MADE OF TWO MARSHMALLOWS AND A TOOTHPICK NARRATOR (CONTINUING) So this represents a 1-dimensional line segment which has length, but no width or height. If I move it at right angles the same distance, it traces out a square. I can build that with two more marshmallows and three more toothpicks. [CUT] S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR IS HOLDING MODEL OF SQUARE MADE OF FOUR MARSHMALLOWS AND FOUR TOOTHPICKS NARRATOR (CONTINUING) This represents a 2-dimensional square which has length and width but no height. If I once again move it at right angles the same distance it traces out a cube. I can build a cube by adding four marshmallows and eight toothpicks. [CUT] S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR IS HOLDING MODEL OF CUBE MADE OF EIGHT MARSHMALLOWS AND TWELVE TOOTHPICKS NARRATOR (CONTINUING) This represents a three-dimensional cube which has length, width and height. Now we'd like to do this trick again but we can't -- we've run out of dimensions. I can't even imagine what the next step would be. There are people who claim to be able to visualize higher dimensions, but other people wonder if they really can. Still, there is something we can do. We can study each transition from one dimension to the next, and then say what would happen by analogy. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) So without a visual model to guide us, we can say: "Imagine moving the cube at right angles again, the same distance, and tracing out a hypercube, a 4-dimensional cube. We could build one of those with eight more marshmallows and 20 more toothpicks." Now, I have no idea what that means, and we don't have that extra fourth dimension to try it in. But the point is, a lot can be figured out by analogy. We know how the math works at any dimension, up to infinite dimensions, believe it or not. And luckily, we don't need to visualize the problem to solve it -- only to explain it! That's why we're better at solving it than explaining it. Okay, I've been jollying you along here with this higher dimension stuff to prepare you for this next statement. It probably won't make perfect sense, but it's the best we can do. The Support Vector Machine -- SVM -- method, works this way. It keeps adding dimensions until it can find a hyperplane to split the state space, which means there is a linear decision function. Then it finds all the data vectors that are closest to this hyper-plane. These are called the support vectors. Then the support vectors are used to do recursive feature elimination and to create the decision function. Let me run through that again. The SVM method keeps adding dimensions until it can find a hyper-plane -- a higher-dimensional plane -- to split the state space, which means there is a linear decision function. Then it finds all the data vectors that are closest to this hyper-plane. These are called the support vectors. Then the support vectors are used to do recursive feature elimination and to create the decision function. Now one thing that's interesting about this is that we started out wanting to throw away columns, and we did, but we also threw away rows! Any vectors which are not support vectors are thrown away and not used for the calculating the decision function. Okay, I know your mind is swirling. Maybe some pictures will help. S-___. CARD: FIGURE F-11

NARRATOR (CONTINUING) The following images are taken from an interactive Java Applet written by Hong Zhang of BIOwulf. Now once again we have the advantages and disadvantages of being back in 2 dimensions. It's good because we can see the whole state space at once, but it's bad because it can't show any of the higher dimensional data. The way this Applet works is the user places data points in a white rectangle, black dots for the pluses and open dots for the minuses. Then the Applet chooses the support vectors and computes the decision function using the SVM method. Next the decision function is displayed as color, red being the low negative values and blue the high positive values. The data points chosen to be support vectors are circled. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) This was for a linear decision function. Here's how it looks if we allow a non-linear decision function. S-___. CARD: FIGURE F-12

NARRATOR (CONTINUING) You can see that the data won't permit a line to divide the state space, so a curve is needed. In this cave the curve is defined by a third order polynomial. This is an equation with no more complex terms than x cubed. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) Let's take a look at an animated sequence of these. S-___. CARD: FIGURE F-13 (f01.gif)

NARRATOR (CONTINUING) We started out with a grid of nine data points. The three upper rightmost were minuses while the rest were pluses. Three points near the dividing line were chosen by the Applet to be support vectors, and the decision function follows a slight curve down the middle, though it bends more elsewhere. Next we added another "plus" data point, near the center, just up and to the right of the center point. We let the Applet compute again, and... S-___. CARD: FIGURE F-14 (f02.gif)

NARRATOR (CONTINUING) ... now you can see the Applet has chosen the new point to replace the center point as a support vector, and moved the decision function a little, but there are still only three support vectors. S-___. CARD: FIGURE F-15 (f03.gif)

NARRATOR (CONTINUING) Now we add a few more data points, this time minuses, farther to the upper right of the center. We let the Applet compute once again ... S-___. CARD: FIGURE F-16 (f04.gif)

NARRATOR (CONTINUING) ... and now you can see the Applet has added one of the new points as a support vector, and dropped the one in the upper left, but still has only three of them. Of course it has moved the decision function again. S-___. CARD: FIGURE F-17 (f05.gif)

NARRATOR (CONTINUING) Here's another positive data point added... S-___. CARD: FIGURE F-18 (f06.gif)

NARRATOR (CONTINUING) ... and it gets chosen to be a support vector. The decision function nows jumps more dramatically. S-___. CARD: FIGURE F-19 (f07.gif)

NARRATOR (CONTINUING) Here's another positive data point added... S-___. CARD: FIGURE F-20 (f08.gif)

NARRATOR (CONTINUING) ... and it gets chosen to be a support vector. The decision function is tightening up as we give it finer and finer distinctions. S-___. CARD: FIGURE F-21 (f09.gif)

NARRATOR (CONTINUING) Now we've hopped down to region lower and right of center, and put another negative data point near the dividing line. S-___. CARD: FIGURE F-22 (f10.gif)

NARRATOR (CONTINUING) ... and it gets chosen to be a support vector. But note that now there are four circled points -- four support vectors. The support vector machine needs more data points to define the decision function. S-___. CARD: FIGURE F-23 (f11.gif)

NARRATOR (CONTINUING) Here's another negative data point added, advancing into what was clearly positive territory. S-___. CARD: FIGURE F-24 (f12.gif)

NARRATOR (CONTINUING) ... and it gets chosen to be a support vector, replacing the previous point. But notice that the positive data point in the lower right has also bbeen made a support vector, totaling five. The decision function is really tightening up now. S-___. CARD: FIGURE F-25 (f13.gif)

NARRATOR (CONTINUING) Once again we add another negative data point in what was previously positive territory. S-___. CARD: FIGURE F-26 (f14.gif)

NARRATOR (CONTINUING) Woah! The decision function just got wider and more curved! And the number of support vectors has increased again -- now there are six. Each I time I add a point in the wrong territory I'm violating its assumptions, so it has to make a subtler decision function. S-___. CARD: FIGURE F-27 (f15.gif)

NARRATOR (CONTINUING) Okay, we add one last negative data point the same way. S-___. CARD: FIGURE F-28 (f16.gif)

NARRATOR (CONTINUING) ... and it gets chosen to be a support vector, along with another point -- now there are eight!. The decision function has twisted and widened to accommodate the more stringent data. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) Well, we could go on like this all day. It is fascinating to watch, and even more fascinating to interact with the Applet yourself. But now lets look at something you don't see when you play with the Applet. The following animation shows those frames we just went through speeded up, with the compute time removed. S-___. ANIMATION A-10

NARRATOR (CONTINUING) It's a thing of beauty. Sometimes the eye can pick out patterns in motion that we don't see any other way. Well, enough of that. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) I hope this is giving you a sense of what the Support Vector Machine method does. Unfortunately, we can show you 2 dimensions but all the real benefits occur with much higher dimensions, which we can't show you. S-___. CARD YELLOW LETTERS ON BLUE BACKGROUND: kernels NARRATOR (CONTINUING) Kernels. S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) This is the last concept. Is your brain full yet? We told you earlier how a Support Vector Machine is looking for correlations. Well, the way a SVM measures correlations is with a function, called K, the Kernel function, which the user must provide. It turns out there's more than one good way to measure correlation, it depends on the data. Some data worls better with one, other data with another. This is one reason why mathematical experts like BIOwulf's are still needed, and the whole process can't be autoimated, yet. For any given type of data, like rat diets or a particular tissue sample, there is an optimum kernel, which an experienced mathematician can determine. Well, I'm sure some of this made perfect sense, and some of it seemed like mumbo jumbo. It's that way to most of us, too. But the proof, they say, is in the pudding. Now that we've done a survey of the mathematical tools used by BIOwulf, let's ask the most important question: S-___. CARD YELLOW LETTERS ON BLUE BACKGROUND: Does it work? NARRATOR (CONTINUING) Does it work? S-___. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) Well, the short answer to this is: it sure does! In the paper "Gene Selection for Colon Classification using Support Vector Machines by Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik, the story is told of these 62 biopsies samples we mentioned earlier. They report that SVM is better than the next best known technique with 95.8% confidence. Furthermore, the SVM used 8 instead of 32 genes to make the classification, after recursive feature elimination, and still had a lower error rate. Remember that this was from frozen tissue samples that were combined with 30 years of follow-up patient data. This stuff can save lives! It was also interesting to find that other methods falsely associated the presence of skin cell genes with cancerous cells, just because cancerous tissue samples tended to have more skin cells while healthy samples had more muscle cells. The SVM method did not make this false correlation. Also, the SVM method did correlate lack cancer with the presence of genes from a tiny parasite, a protozoa named Trypanosoma, indigenous to Africa and South America. To quote the paper: "We thought this may be an anomaly of our gene selection method or an error in the data, until we discovered that clinical studies report that patients infected by Trypanosoma -- a colon parasite -- develop resistance against colon cancer." Clearly we need to put this new tool to work on medical diagnosis right away! Thank you for your time in viewing this explanatory video. For more information about BIOwulf, see our website at www.biowulf.com. S-___. SCROLLING TITLES: CREDITS DOWNWARD SCROLLING YELLOW LETTERS ON WATERFALL BACKGROUND:
Written, Directed and Edited by Alan Scrivener. Narrated by Mollie Tobin. Computer graphics by Alan Scrivener. Geographical visualizations by Jeff Sale. Special thanks to Wayne Holder and Bob Zawalnicki. Produced by Human Interface Prototypes for Mindtel Corp. Filmed at the San Diego Public Library, Mission San Diego de Alcala, and Mission Trails Regional Park, San Diego, California. (C) 2001 Mindtel Corp. Keep wondering.

Last update 19-Jul-2001 11:59 PM PDT by ABS.