Document to collaborators on a method to extract estimates of plant leaf area from image analysis

 

1. Apologia pro vita sua, or something very similar: the quick story behind developing the method of leaf-area estimation by image analysis, and why it took so long:

 

            These data from creosotebush on the Mojave Global Change Facility have certainly been the acid test - every possible limitation of the method of leaf-are analysis had to be discovered and addressed:

  * Leaf colors for nominally live leaves that matched the color of much soil (just discovered today; pecans and unstressed Larrea were so much easier!)...and leaves that are litterally partly dead but have to be counted as live when put through analysis, with a standard leaf-area meter.

  * Concerns about camera resolution, chromatic aberration (in an earlier camera), and white balance (some cameras have pitfalls about reverting to defaults)...and about color shifts with under- or over-exposure (baseless, but it took a long time to verify this)

  * Leaf clumping that requires new methods of estimating corrections from the basic model for converting green fraction in view to leaf area index (again, pecans were trivial in comparison).  These data are an ideal test ground for how clumping affects resolution of "hidden" leaves.

            Earlier studies on our own Larreas brought out the raft of other challenges, so that we now know how to take the pictures for maximum reliability and how to write efficient, user-friendly programs. We also learned to:

  * Take pictures in a reproducible geometry for every repeated measurement.

  * Avoid the biggest geometric distortions in leaf views, and a number of potential problems in

   automatic exposure in the camera.

  * Deal with the problem of mixed pixels - part leaf, part non-leaf; this demands special color

   indices (linear differences).

In addition, I had to develop novel algorithms of general utility (for deciding in a fast computation if a point lies inside a polygon, and for removing the effects of leaf clumping with a theoretically sound algorithm based on detailed simulations).  I also wrote up useful guides to color theory, in considerable detail, guides that we use in other research.

            In the end, it appears that we have a method to analyze leaf area under a range of plant structure, leaf optics, stress conditions, lighting conditions, and camera operation quirks.

 

3. Results of calibrations of the image-analysis method with MGCF

  data on Larrea leaves, from a shrub disassembled progressively

 

            This is the cut to the chase.

            The Denude team (Fig. 1) pursued a large and complex operation to measure leaf areas in a manner that gave images suitable for comparison.  They removed 24 branch sections, one at a time.  Each such branch was photographed and then its leaves were removed for analysis by an optical scanner, the Scion platform.  The remaining whole shrub was also imaged.  The geometry of views was highly reproducible, simplifying the analysis, and in each image a physical scale of distance (a ruler0 was included.

             Ideally,

            1) The area of each branch obtained by image analysis should match the area measured by taking all the leaves off and scanning them.

           

Text Box:  
Fig. 1. Denude group.  L. to r.: Amrita de Soyza, Walt Whitford, Beth Newingham, ??, and Karen Brown (neé Nielsen).






Fig. 1. The Denude team: (l. to r.:) Amrita de Soyza, Walter Whitford, Beth Newingham, XXX, and Karen Nielsen (Brown)
2) The area of the whole shrub at each stage of disassembly can be reconstructed in reverse order, as the sum of branch areas remaining.  After the last branch (number 24) is removed, one knows that the area on the shrub before this removal was the area on branch 24.  Before branch 23 was removed, the whole shrub had the area of branches 23 plus 24, and so on.

            This ambitious program could not be followed entirely.  It takes so long to pick off individual leaves and place them in nonoverlapping fashion on the scanner platform that the backlogged branches were at risk of dehydrating and or molding.  A number of branches (12 of the 24) were stripped of leaves that were only weighed, not scanned.  I used the weights to estimate leaf areas, based on the mass-area regression obtained for leaves that were both weighed and scanned.  (This regression, with a forced intercept of zero area at zero mass, is area(cm2) = 54.03*mass(g).

            The Nevada site data provide the most extreme challenge one is likely to encounter in estimating leaf area by image analysis.  Under extreme water stress, leaves discolor at the margins and grade continuously (in both color and physiological function) toward dead leaves…which also have the color of desert soils.   Even the nominall green areas of leaves are not truly green: in the digital imagery, the green channel value is very often lower than the value in the red channel.  These leaves have been described oxymoronically but accurately as “red green.”  Fig. 2 shows the whole shrub that was progressively dismantled for leaf area measurement by a standard optical tray, plus one of the separate branches.  Fig. 3 shows an enlarged view of a branch, where one can see the individual pixels that range from a tepid green to a brown.  Both these figures also show the use of a very distinct saturated-blue background.  This background isolates the image of the one shrub from other shrubs, and also provides a strong contrast to leaf, stem, and soil (see Sec. 4.A.2 below on use of backgrounds).  Fig. 4 shows soil, with pixels in the same color space as dying leaves (color as measured in the most useful coordinates, green minus red and green minus blue; see Appendix I).  The two major challenges in estimating leaf area are then:

            1) At what degree of color change should a leaf be considered dead?  In color space alone, the gradation is continuous.

            2) Many leaves have both green centers and red margins.  Does one count the whole area of a leaf as live, if the majority is green?

            I was not present at the destructive analysis to make these judgments.  In my image analyses, I chose to cut off the color space for live leaves at the visibly gray, not reddish hue.

I chose also to count the entire area of a leaf, up to such hue.  This is not a fully satisfactory solution (Appendix II.2), but it is the best compromise.  I can provide more theoretical analyses of likely biases, if desired.  This solution to the dilemma requires the actions described in Appendix II.2:

            1) Choose the color space that defines live leaves, including their dying margins.  The concept of a color space is detailed in Appendix I.

            2) Find the color space occupied by soil pixels, and in particular that part that overlaps the “live leaf”color space.

Text Box:  
Fig. 2a. Whole shrub, partially disassembled for destructive leaf area measurement (first 14 of 24 branches removed).  Note blue background for isolating shrub and providing color contrast, and ruler for establising physical scale.
Text Box:  
Fig. 3. Enlarged view of a branch on the whole shrub, showing the color variations and especially the gray to tan margins on nominally live leaves.
Text Box:  
Fig. 2b. One of the branches removed, imaged against a blue background with a ruler present to establish physical scale.
            3) Determine what fraction of “live leaf” area is not counted when the color space for live leaves is replaced by the color space that strictly excludes soil pixels.  Do this by analyzing a photo or a section of photo that has branch and background but no soil.  In the Nevada dataset, 31% of leaf area is lost, so that when one uses the stringent color space, one should multiply the reported leaf area by 1/(1-0.31) = 1.45.   This multiplier is actually a modest function of the actual leaf area index, so using it for all images is a further approximation.

 

            A further consideration is that, when leaves are so very marginally green, any modest change in the color quality of lighting can shift recorded color of any pixel from values indicative of leaf to values indicative of non-leaf (soil, stem), or vice versa.  This happened clearly on 7 images of the whole plant.  Compare Fig. 5 with Fig. 2 to see the shift in color and in lighting geometry (diffuse vs. direct sun).   Late or early sunlight is notably more red than is midday sunlight; overcast skies give sharply increased blue.  One must commonly accept whatever natural lighting conditions occur, but the solution is to adjust the white balance of the digital image for each significant change in lighting conditions.  Alas, the first 7 images of the whole plant were taken in light overcast conditions, as indicated by the lack of branch shadows…and by the shift in color of a white cloth tag as well as of the blue background (assessed visually and by computing regressions of G/R and G/B).   I could not derive a reliable post facto correction, so I did not use these images in the final analysis.

Text Box:  
Fig. 4. Enlarged image of soil from same shrub photo, displaying range of colors that overlaps colors of leaf margins in Fig. 3.  One can also see streaks of green from understory vegetation.
            Accepting the limitations and solutions above, I processed all the images of branches and all the images of the “whole” shrub.  All areas estimated by image analysis were multiplied by 1.45 to account for the dying margins (item 3 a few paragraphs earlier).  The regressions are shown in Figs. 6and 7.  They are reasonably satisfactory, with r2 values of 0.80.  The outliers have not been reanalyzed in more detail (for further color corrections), given that the most extensive corrections have been applied and insufficient information exists on lighting conditions to make the most meaningful further corrections.  With accurate white-balance control in the imagery, I would anticipate a major gain in r2, and with greener leaves, even more (to 0.95? a data set on green leaves with accurate whole-plant and branch area determination is lacking).  Fig. 8 shows an image of an isolated pecan tree with the background sky color-shifted to white; leaves are totally unambiguous.  Fig. 9 shows a row crop, arugula, which introduces new considerations about leaf angle and (non)-randomness of leaf placement, to be discussed in Appendix II.3.

            One final note on the shrub imagery is that ground cover vegetation shows up in the analyses.  The intercept is nonzero, in the regression of image-analysis area as a function of directly-measured area.  Close inspection of the images shows fine green elements, which cannot be identified in the images but which are almost certainly Cryptantha or cryptogamic crust or both.  In future imagery, it would be useful to cover the soil by the shrub base with a blue fabric background, as well.

            For the Nevada team: I can provide all the detailed notes and spreadsheets as desired; they are too extensive to include here.

 

4. Overview of the user operations (details will be in a manuscript and also

  in a full manual)

 

  A. Taking the images in the field

 

Considerations:

  “0.” A high-quality digital camera is easiest to use.  A film camera is usable, if one

Text Box:  
Fig. 9. Image of row crop (araugula) with strongly horizontal trend in leaf angle and overdispere placement of both plants and leaves within a plant, thereby requiring modifications in image analysis.
Text Box:  
Fig. 8.Cropped image of isolated young pecan tree in Las Cruces, NM, photographed against blue sky.  Complete color distinction between leaves and background is evident, in contrast to color confusion with Larrea tridenta against soil.
Text Box:  
Fig. 7. Regression of leaf area (single branches) estimated from image analysis against area measured by destructive harvest.  The color space is as in Fig. 6. The slope is 8% low relative to the expectation of unity.  The intercept is very small, as expected.
Text Box:  
Fig. 6. Regression of leaf area (whole plant) estimated from image analysis against area measured by destructive harvest.  Points in blue are included; points in pink are excluded by reason of shift in image white balance that cannot be simply corrected.  The color space (polygon in G-R, G-B) referred to is one that excludes leaf colors that overlap soil colors, with subsequent multiplicative correction for area, as noted in text.  The slope is 20% higher than the expected unity.  The intercept of 213 cm2 is consistent with presence of low herbaceous vegetation.
   

      is prepared to scan images and perhaps rectify exposure errors after the fact.

      (We used a film camera for years, but now prefer digital.)  If you use a film

      camera, be sure to use the same kind of film each time (for reproducible color

      balance) and to use a reliable photofinisher (ditto).  I can send you scanning

      instructions for film images.

  1. The geometry must be known and also reproducible

     That is, you must know how many pixels represent 1 cm at the leaves, so we can

           convert from pixel coverage to cm2 of leaves

    The view angle must be reproducible, so that the whole shrub is seen, and each

           part is seen at the same distance each time

    It is possible to analyze images that violate this reproducibility, but the analysis

           time is very much longer (10 to 100 times), involving trips back to the field.

    Action item: install survey markers at each shrub to be imaged repeatedly, and

          have notes about camera placement (height above the ground, pointing angle)

          and lens focal length used.

  2. The viewpoint for each shrub must be chosen so that only the one shrub is imaged,

      if at all possible (it is tedious and sometimes dubious to mask out the other plants

      digitially, if they show up behind, around, or in front of the desired shrub).

     For isolating a shrub, including from soil background, it is usually desirable to

       impose a background of distinct color, a color that cannot be confused with any

       plant part.  We try to use a very saturated blue (pure blue, not baby blue).

     In pathological cases, where stressed/ dying areas of live leaves are as brown as

       some soil areas, you will also want the blue background to cover the soil.

  3. Even exposure of leaves make the analysis a bit more accurate. If an image is

       overexposed or underexposed, pixels with leaves in them may look so bright or so dim that

       they cannot be recognized as leaves in the image analysis program.

     Use a camera mode with accurate exposure at the leaves - if you can choose a

         spot-metering mode, do so.  You don’t care if the background is exposed normally,

         only the leaves.

     Sometimes the camera must be lifted above one’s head to take the picture.  If you

         must do this, be sure to block the camera viewfinder (eyepiece) with the attachment

         provided with the camera (certainly on a single-lens reflex).  If you don’t block this, light

         will enter the finder and trick the camera into thinking it’s lighter than it is, and

         the image will be underexposed, sometimes severely.

     Minimal shadowing is desired.  It adds difficulty or uncertainty for the computer analysis

       to try to distinguish leaf from non-leaf in deep shadows.  We try to shoot in midday,

       within two hours before or after noon.

  4. Reproducible color balance is critical.  The color balance of natural light varies greatly

        over the day and with weather conditions - sunrise and sunset give reddish light,

        overcast gives very blue light.  The color shifts are accommodated in our eyes but

        not reliably in images (even digital images with automatic white balance). 

     The reliable way to adjust for shifts in light quality is to use custom white balance on a

        digital camera (the analogous thing can be done for film images, with a bit more

        effort and a modification of our computer program that I have not done to date).

        The grey-card method is explained below.  If you don’t do this, it will take a goodly

        effort to correct for color shifts in individual images.

  5. The ideal case is to have all parts of the shrub within a narrow range of distances from the

      lens.  Otherwise, the parts that are near look artificially bigger than the distant parts and

      their contributions to leaf area are spuriously increased.

        The ideal way would be to use a long (telephoto) focal length and move just far enough

      away to see the whole shrub and not much more.  This is often impractical - to see all of a big

      shrub, one may have to place the camera so far away that one must stand on a ladder…which

      one might not have.  Use judgment, and use the longest practical focal length.

  6. Leaves ofter show up as only a few pixels across.   This makes many pixels at the edge of

     the leaf occur as mixed pixels, part leaf and part background.  We have made the

     classification of mixed pixels as reliable as we can, but it is not as good as classification

     of pure pixels.  Thus, we want leaves to be as many pixels across as is practical.

        One part of this is getting the highest resolution possible

        * a camera with sufficient  megapixels (and true resolution  à a good lens),

        * but also storing images at the highest JPEG quality setting.  Images at lesser resolution

     look OK to the eye but they have blended fine details among pixels,

        * and finally, using only optical zoom, never digital zoom (which only interpolates cruder optical images and loses information in the broader scene).

 

 

As a result of the considerations above (which were derived after failures), we proposed the

following protocol:

 

1. Pre-field preparation.

   A. Assemble equipment that will be needed:

       Camera, with enough memory space left for number of images anticipated.

       Camera manual, so that one can have complete mastery of functions needed here,

           including custom white balance and spot-metering.

       Grey card for setting white balance.

       Blue backdrop that can be placed under and around shrubs (not critical unless

           some shrubs have undergrowth that can’t be put out of view by judicious choice of

           viewpoint…or if leaves are so stressed that they almost merge into soil color, esp.

           at leaf margins).

       Rigid ruler to determine pixels per cm in the scene.

       For the first setup: survey markers, to mark the locations for taking each shrub’s

          picture.

       Field notebook, including:

          Notes from previous imaging session, in which you recorded the viewpoint and the

              lens focal length you used for each shrub (obviously not possible the first time).

   B. Schedule the trip properly:

       Be ready to take pictures between about 10 AM and 2 PM, local solar time.

2. Set up a unique viewpoint for each shrub - one that shows the shrub and only the shrub,

        has ready access, does not cause the camera operator to throw a shadow on the

        leaves. 

     Record the location, the camera angle, and the lens focal length that had to be used.

3. Adjust the white balance of the camera to that of a grey card (custom white balance).  Do

      this at the beginning, and every hour or when the light changes.

    Follow the instructions for the camera; shoot an image with the grey card and use this to

        set custom white balance.

4. Set the exposure mode to automatic mode.  Be sure that no one has left on any exposure

       compensation shifts from the previous use.  If you can set the aperture (Av mode on

       most cameras), use a moderately small aperture (say, f-stop of 8 or smaller, meaning the

       number 8 or larger - 9.5, 11, 13, or even 16).  Use the spot-metering option, on the leaves.

4. Take the pictures of each shrub.

   A. Place the blue background, if needed (recommended in general).

   B. Find the proper viewpoint for the shrub.

   C. Set the proper focal length for the view (for this particular shrub):

       Generally, this is mild telephoto (ca. 80-100 mm equivalent 35mm-camera focal length;

            about 50-60 mm on digital SLRs that have lens-multiplier factors near 1.6 because of

            use of small CCD image-capture chips).

       This may be impractical in some cases-  esp. to see all of a big shrub, one may have to have

           the camera tooo far away; use judgment. 

       In any case, it’s best to always use the same focal length for a given shrub each time;

           record the length you used the first time and use it every succeeding time.

   E. Take the picture.

       If you have to hold the camera away from your eye (say, you have to hold it above your

          head to image the whole shrub), then you must block the viewfinder with the camera-

          supplied blocker (usually on the carrying strap; sometimes there’s a shutter you can flick

          closed on the viewfinder).  This is necessary because sunlight will enter the viewfinder

          and trick the camera into thinking there is more light that the lens will see; the camera will

          reduce its exposure and you will get a dark image, possibly an unusable one.

   F. Check that the image is OK- that it has a reasonable image density, and that it shows the

          complete shrub, and that color is not odd.

 

  B. Protocol for image analysis on the computer

 

            From here on, I’ll assume that you have digital images - straight from the camera, or scanned in from film images (at high resolution and with color management - I can generate instructions if you pursue this method).  The images must be TIFF images.  Many cameras store JPEGs only, or RAW.  Use your software (supplied with the camera, or freebies like xv, GIMP, or IrfanView, or proprietary stuff such as PhotoShop) to make the conversion, if necessary.

It is OK if the file comes out with LZW compression.

 

            Your first task is to find out the colors of pixels that qualify as live leaves (and dead leaves, if you also want those).  I have found that the best way to describe the allowed colors is to find a representative number of pixels that are definitely leaves, and find their values of G-R and G-B (the primary colors R, G, and B are recorded in a digital image as values from 0 to 255 each, at least in 8-bit color; 255,255,255 is pure bright white, 0,0,0 is pitch black, 255,0,0 is pure red, 100,100,0 is a medium yellow, and so on; it might help to look at my writeup on color theory on the Web, at http://biology-web.nmsu.edu/vince.

            All the allowed values of G-R and G-B will be found to lie in a cluster.  We can define a simple 4-sided polygon in the space of (G-R, G-B) that encloses this cluster of  points. Appendix I shows an example.  The program has a very rapid way of using the specifications of the polygon to determine if any other value of (G-R, G-B) fits inside the polygon (Appendix I again).

            Basis steps are:

            1. Open the image in an image viewer that can report RGB values of a selected pixel.  For example, in PhotoShop, on the toolbar, click on the eyedropper; on the upper toolbar, click on info; move the eyedropper so that its lower left end is on the pixel you chose.  The RGB values, and the x,y coordinates, will be displayed in a dialog box.

            2. Wander around the image, selecting pixels known to be leaf, and record their values of R, G, and B (and x, y, so you can relocate them for double-checking later).  Enter these values into a spreadsheet, where you will set up columns to compute G-R and G-B.

            3. Do the same for pixels representing the other cagegories you do NOT want to have counted as leaf - stem, background, soil, dead leaves.  I make sections for each of these.

            4. Plot the G-R, G-B values in some plotting package.  I find the shareware package DPlot extremely convenient.  Give each category a distinct color - e.g., green for live leaves, gray for stem, red for soil, …  You will see distinct clusters.   The results are detailed in Appendix I; see esp. Fig. AI.1.

            5. Figure out a 4-sided polygon in this plot that will enclose live leaf only (or almost always; one might need to accept a few overlaps with other categories, in rare cases…such as the Nevada data, where it is really a judgment call to say where live leaves end and dead leaves begin!).  This must be non-re-entrant (no interior angle can exceed 180º = no angular “boomerang” shape).  Record the coordinates of the four vertices for use in my image-processing program.

 

            Image analysis per se: the program is set up for batch processing of any number of images in sequence (up to 1,000).  Of course, you can do single images.

            You can run the program in any of several ways:

   1. In a DOS window, invoking the executable LA_analysis(.exe).

       A.  You need to install LA_analysis.exe in the same directory as your images, or,      

         alternatively, in your path (edit C:\autoexec.bat to do this). 

       You will also need to install the compiled C program, scanline3.exe, in the

         same directory (or in your path, again).  I recommend using just the executable.

         I can give you the source code, but you need the complete TIFF libraries, and

         there is some customization to be done.  It took us a couple of days to figure this out!

      B. If you have an earlier version of Windows, with DOS windows that are not

         scrollable, it will be difficult to check your output.  I heartily recommend getting

         a copy of Commander11 (cmd.exe) and installing it; it provides a scrollable DOS

         window.

    2. From a Fortran compiler, such as Compaq Visual Fortran, which has a graphical user

        interface.   I can supply the Fortran (and C) source code.

 

            The basic operation is to enter line commands, including the names of the TIFF images to be processed, the pixels per cm in the image, etc.  There are two ready ways to do this:

    1. Manually: invoke the program with the command “LA_analysis” and then respond to each

        prompt from the program.  I will explain what each one means.

    2. From a file in which you composed all the commands (and can save, esp. saving the

        effort of retyping when you change processing options).  Create a file, as in Notepad (a

        plain ASCII text file, unformatted only).  In the DOS window, use an input redirect:

           > LA_analysis < myfile.txt

        This will read in the contents of myfile.txt as commands.

        You can do an output redirect, too:

           > LA_analysis < myfile.txt > myresults.txt

        The file myresults will have all the printouts from the program; you won’t see them on

        the screen (I don’t know if DOS has a “tee” utility that lets output go to both the screen

        and a file, as does UNIX).

 

            Here’s the deal, prompt by prompt:

   1. Enter the name of the file where the RESULTS will be stored.  Be sure this is not the same name as any other file you want to keep, because that file will be overwritten.  In Windows, I usually give it a .txt extension, since Windows is so brain-dead that it doesn’t have the ability to open a file of arbitrary name with an editor.

       The program will record the date that you did this analysis.

  2. Begin a loop, to repeat as often as you want, once for each successive image:

   A. Enter the name of the TIFF file (64 characters or less, in length).   The program will open the file for reading, and then will invoke the C program, scanline3, to create a “raw” file (“scratch.out”) that has only the RGB values of each pixel, concatenated into one long line for each line of the image.  This will show up in your directory.  The program will proceed to read this new file to find the (x,y) size of the image, which it will echo to you on the screen.  It will have the qualifier “including border areas.”  You can blank out areas you do not want analyzed; see Appendix II on special handling of images.

  B. Set the processing parameters.

   If this image is the first one to be processed, you have to set each parameter.  If it is the 2nd or later, you have the option of changing none, one, or many parameters; the program prompts explain these choices.

    The color space for live leaves: Enter the coordinates (G-R, G-B) of the four vertices found to delimit live leaves.  These must be entered in clockwise order, starting from the leftmost (lower or upper, it does not matter).  The program will ask you to repeat the entry if you don’t follow this convention.  The program will report the direction cosines as angles as it prepares its internal criteria.

    igreencut: Enter the minimum value of green (G) that a pixel may have and still be considered as leaf.  In deep shadow, when one uses some films, there is a color shift, and a neutral gray can come out green by accident.  I often use 15 for this parameter.  For digital images, I use 0 (there is no bias between R, G, and B channels at low light).

    maxgreen: Enter the maximum value of green for a leaf.  This is used to prevent specular highlights counting as leaves, in pathological cases (mostly they look plain white, which does not nearly qualify as leaf, but rarely this happens).  I set this to 220 (out of 255, this is really bright!).

   The program will then read all the pixel RGB values, and will use the criteria above to identify each pixel tentatively as leaf or not leaf.  I will explain the “tentatively” part shortly.

   enx1cm: Enter the number of pixels that span 1 cm in the image.  This item is used to convert leaf area in pixels to physical leaf area in cm2.  Of course, in a real image, the scale varies from center to edge; pick the middle.  Also, the scale varies for leaves closer and farther than average from the mean distance to the camera.  Again, pick the middle (this is why it is important to use as distant a view as is practical, to get the least variation in leaf-to-camera distances among all the leaves).  If enx1cm is entered as a negative number, it is interpreted as pixels per inch, not cm, and then it is converted to pixels per cm.

   leafsizepixels and startingmult:  The full theory behind these parameters is noted in the Appendix III on turbid-medium theory of image analysis.  The program has to analyze the image in small sections or “tiles,” and add them up.  The smallest tiles have to be at least several times as wide as the average leaf, in pixel count.  You specify this smallest tile size by entering the average pixel span of leaves (the ones viewed flat, with maximum dimension) and the multiplier of this to use (say, 2, to get the smallest tile as twice this size).

   minside: Enter the smallest partial tile to use in analysis.  For an arbitrary image, a given tile size is unlikely to cover the image exactly with an integral number of tiles.  There are also complications when complex areas are blocked out (see below).  In either case, one may end up with tiny “orphan” regions that cannot be analyzed reliably.  I use minside = 10, so that an area of less than 100 pixels is not used.

   nnotgreen: Enter the maximum number of adjacent pixels that can be non-leaf, for a given pixel to count as leaf.  Here is the explanation of “tentatively” from above.  Particularly with film images, one can get noise pixels that are soil, stem, or the like but are green by error.  In order to discount these isolated pixels, I introduced this parameter.  A real leaf pixel will always be connected to other leaf pixels, commonly on all sides, but usually on 2 or 3 other sides (up, down, left, right) even if the pixel is one the edge.  This parameter is not significant for digital images (at normal ISO), so I set it to 1, or maybe 2.

   At this time, the program will complete the analysis.  It will analyze the image for leaf area index in a series of tesselations or tiles covering the whole image.  In each tile, it will estimate the leaf area.  It will then use the pixels per cm information to convert this to real leaf area.  Note that the turbid-medium model (Appendix III) accounts for leaves hidden behind other leaves.  It does so only roughly at first, because clumping of leaves (densely here, sparsely there) violates the assumptions of the theory.  The program corrects for almost all of this bias by redoing the analysis at several tile sizes and then extrapolating to zero tile size (Appendix III).  The program reports the estimates of leaf area at each tile size, plus the final extrapolation.  It also writes the information to the file you specified at the beginning.

 

   If this in not the 1st image, you will be asked only for changes in the parameters.

 

Appendix I. Examples of clustering of leaf colors into a polygon in the space (G-R, G-B), and an algorithm for determining if a point lies inside a polygon

 

1. Observations that leaf colors cluster well

 

            Many individual images have been analyzed.  Particularly illustrative is a composite analysis of three separate images that differed considerably in exposure of the leaves, ranging from dark (underexposed) to normal to light (overexposed).   Figure AI.1 shows the results, which will be commented upon more extensively in the next paragraph.  This comparison was part of the inquiry into whether or not exposure affected the pattern of leaf colors, i.e., the polygon in (G-R, G-B) into which the leaf pixels fell.  The comparison revealed some useful regularities (exposure hardly matters), and some pitfalls (dead and live leaves grade into each other and one must make a judgment call about the dividing line).  It also revealed that the space (G-R, G-B) is very effective, more so than simper alternatives such as (G,R) and more so than complex alternatives such as dividing G-R and G-B each by the total value, R+G+B.  The latter alternative was considered for the potential of removing bias from overall exposure level.  However, 2 problems arose: 1) this normalized color space actually mixed the pure live-leaf pixels with the pure pixels of other items (stem, soil,….); Fig. AI.2; 2) inherently, the new coordinates are nonlinear in R, G, and B; consequently, mixed pixels (say, half leaf, half stem) will not be recorded with proportional probability as being either element; the brighter part of the mixed pixel will dominate, for one.  We therefore stuck with the use of (G-R, G-B).  Live leaves are quite distinct in this space, even for Larrea plants.  Live leaves are higher in G-R than dead leaves, and they are higher in G-B than are stems.  The saturated blue background is far away from live leaves in this space.  The color of soil pixels can overlap the colors of very stressed leaves or dying portions of dead leaves, calling for special treatment noted in Appendix II.  For most other cases, such as healthy leaves of agricultural crops (pecans, in another case we treated), the contrast of leaves to all other elements is extreme.  There are no problems in analyzing leaf area in such cases.

Fig. AI.1. Clustering of color coordinates of live leaves, apart from dead leaves, stems, and background.  See text for an explanation of terms.  Images that are sources of data are indicated by number (branches 6-2, 4-4, and 16, respectively with high, normal, and low overall exposure).  Two types of data points are not noted in the legend, which is limited to 16 items: shaded soil (green open stars); soil shaded by blue backdrop (red X’s)

 

2. Effective computation of whether a point lies in a polygon

 

            Let it be accepted that leaves have colors that lie inside a polygon in the coordinates G-R and G-B.  Now we are faced with computing if any new pixel with some values of G-R and G-B lies inside the selected polygon.  There are hard/slow and easy/fast ways to do this.  I found a rapid way, undoubtedly known to geometers and people who do image simulations.  Simply,  one has to determine, for each side in turn, if the point lies on the interior of the side or the exterior.  This is simple: construct for each side a direction cosine that is perpendicular to the side and that points to the interior of the polygon (Fig. AI.3).  (Knowing which side is interior is easiest if we require that the sides be specified in a regular order; hence, the restrictions noted in Sec. 4.B above.).  This can be done once and for all after the polygon has been specified.  Then, for any given point, construct the vector from the midpoint of the side and reaches to the point in question (Fig. AI.3).  If the dot product of this vector and the direction cosine from the side is positive, then the two vectors are on the same side of the polygon side, and the point can be inside the polygon.  I say “can be” because we must verify this for all 4 sides; if the test fails for any side, the point is outside the polygon.  Only a few multiplications, additions, and if-statements are needed. Voila.

Fig. AI.2. Failure of brightness-normalized color coordinates to cluster points representing live leaves.  Data are from image of branch 4-4, same as used in Fig. AI.1.

 

Fig. AI.3. Construction of vectors allowing determination that a point (x,y) is or is not inside a polygon.  Vector vdc is the direction cosine from the midpoint of side A; vector vp is from the midpoint of side A to the point in question.  If these vectors have a component in common (dot product is positive), they lie on the same side (inside) from side A.

 

 

 

 

 

Appendix II. Special handling of images that require elaborate cropping patterns or that have leaves with dying margins not so different in color from soil

 

1. Challenges in cropping some images to isolate one desired shrub

 

            With some luck and care, each shrub can be imaged separately from others and from undergrowth and any distracting artificial items.  This is not always the case.  Editing the image by cropping is possible, but most cropping options involve rectangular selections.  More flexible is the use of Bezier curves, but these cannot be used as cropping margins in all image processing programs such as PhotoShop.  There is a work-around:

            1. Use the Bezier curve tool to outline the desired shrub.

            2. Make this the selection.

            3. Invert the selection (so that everything outside is selected).  The way of doing this varies with the program one is using.

            4. Use the paint tool to turn this inverse selection (the area to be rejected) a pure white (RGB = 255, 255, 255).  My LA_analysis program will ignore all the pure white areas.

 

2. Challenges in dealing with images where live leaves have dying margins nearly the color of soil

 

            Let us return to the problem of the stressed leaves.  Assume, as in the Nevada data, that the margins of apparently live leaves are the color of some dead leaves and, thereofore, also of the soils in Nevada (or Las Cruces).  Assume also, as is the case in the data on hand (which were obtained at great effort over a period of weeks), that we can’t retake the pictures with the soil covered up with a distinct backdrop, such as a blue sheet (the best alternative for future pictures!).  Assume, finally, that we do not want to manually color all the soil pixels blue in PhotoShop!  We must then make our criterion for live leaves more stringent, so soil pixels will not be misidentified as leaves.  This means moving the polygon so that only pixels with even more green will count as leaves.  This introduces a problem: dying margins of live leaves will not count as leaf area.  If leaves did not overlap, this would be no problem.  (If leaves did not overlap, image analysis would be trivial, too.  It is not.)  We have to count dead margins as part of live leaves to get the correct leaf-interception probability “fgreen” for computing leaf area index.  The simplest way is to estimate the fraction, fdying, of pixels on live leaves that will not meet the new, stringent criteria, and then multiply the raw value of fgreen by the correction factor 1/(1-fdying).

            There are then two extra steps in image analysis for such stressed plants.  First, one must define a more stringent live-leaf criterion that excludes soil pixels (within a small margin of error; no classification is perfect).  One needs to examine soil areas, well-lit and shaded both, in an image-manipulation program and find maybe 30-100 pixels.  Record the RGB values (and the x,y coordinates), and compute G-R and G-B.  Plot the latter values, as done in Sec. 4.B.  Now, redraw the polygon for live leaves so that these soil points are excluded.   Second, estimate the fraction fdying of apparently live leaf area that will now NOT be identified as leaf.  One can try this visually, though guessing fractional area coverage is notoriously difficult in complex paatterns.  An alternative is to analyze a small part of the original image known to have only “live” leaves.  That is, pick an area with leaves shown only against the blue backdrop, if one is using this.  Crop the original image to this section.  Analyze it once with the liberal polygon and once with the stringent polygon.  The latter will give an estimated area that is smaller by the factor 1-fdying = Astringent/Aliberal.  Solve this for fdying.  It is advisable to do this exercise with areas of the image that include both well-lit and shaded leaves.

XXXXXXXXX Add this to program instructions!

 

3. Other species and growth conditions: effects of leaf angle

 

Text Box:  
Fig. AII.1. Angle of view by observer affects average fractional view of flat leaf area (average cosine; see text), and differently according to distribution of leaf zenith angles.
            In all the analyses done on creosotebush, I have assumed that the leaf orientations are globally random.  Locally by branch or height, the angle distributions may have some structure, as Cunningham and others have found (refs.).  However, the only effect that matters is that the mean fraction of flat leaf area presented at the view angle of the image is ½, to use our basic program.  This fraction is expressed geometrically as the average of the absolute value of the cosine between leaf normal and view angle (camera angle).   Let us call this xLv for future reference.  How accurate is this assumption for some drastically restricted distributions of leaf angles?  It is important to consider that, while leaf zenith angles (relative to the vertical) may be restricted in distribution, in all field studies, the distribution of leaf azimuths is very close to random.  (The azimuth is the angle between the projection of the leaf normal horizontally onto the ground  and a reference compass direction, say, east.)  Figure AII.1 shows how xLv varies with camera view angle for several extreme cases: leaves that are exactly horizontal, exactly vertical, or at only 45º.  The random zenith case, our standard, is also shown.  Note that at the so-called magic angle near 55º the effect of leaf zenith orientation essentially disappears.  This view angle is therefore highly recommended for analyzing canopies of vegetation in which zenith angle varies by position.  An example we have encountered is deciduous forest in the northeastern U. S.; top layers have nearly random leaf orientations, while deep layers are horizontal.  Note also that for leaves that range from 45º to vertical, view angles greater than about 40º give xLv very close to ½.

            For plants that nonetheless have problematic leaf angle distributions, there is a solution: choose a view angle and request that I modify the program, or generalize it, for this case.  The view angle may be constrained - e.g., for large trees one cannot often get views from above at small zenith angles. 

            An example of a problematic case is the leafy vegetable, arugula, grown as a row crop.  With images such as Fig. 9 analyzed naively in my program, the results were not directly usable.  The destructively measured leaf area of the 9 plants in the figure was very close to 6000 cm2.  In the image of the intact plants, my program estimated an extrapolated area of 15000 cm2, 2.5 times larger.  One factor in the inflation is that the mean cosine, xLv, is not ½ but close to 1.  A second factor is that the plants were deliberately spaced or overdisperse, vs. random.  Even within the individual plant there is significant overdispersion.  When the plants were cut and redistributed at random over a slightly larger area, the overdispersion was reduced and the naïve estimate of area fell to slightly over 10000 cm2.  The relationship between plant density and apparent LAI (assuming random placement) is a determinate one and could be used for correcting the areal estimates.  More satisfactory for users concerned with row crops would be a restructured analysis program.  With sufficient interest, I could generate such a program.

 

Appendix III: brief explanation of the turbid-medium analogy, why it must be corrected for leaf clumping, and how it may be corrected

 

                        Image analysis has to account for two aspects of leaf presentation: leaf angular orientation and leaves that hide each other from the line of sight.  We want the total one-sided area of leaves viewed flat-on, while leaves on a shrub occur at a variety of angles.  The simple solution to the orientation problem is to assume random angles, such that the average fraction of flat area presented to view, among all leaves, is ½ (Ross, 1981).  This assumption is quite good even for rather biased leaf orientations that may be preferential relative to the sun, for example.

                        The hidden-leaf problem is also soluble, very simply in ideal cases of leaves whose centers are positioned randomly in 3-D space.  These leaves (or other objects) form a “turbid medium.”  One can readily derive the theory for the fraction of view occluded, focc (fgreen in our case and in the program) as a function of the total “leaf area index” (leaf area per unit view area, L or LAI):

 

                                                                                                (AIII.1)

 

Here, K is the average fraction of flat area presented along the view direction; we take it as 1/2 , as just noted.  We can derive this form by considering the leaves as being presented in thin layers, one on top of another, each having the same total leaf area per view area (same leaf area index, L).  We choose very thin layers, so that this leaf area index is small, dL <<< 1.  Now, the fraction of the view blocked is simply K*dL.  The fraction not occluded is 1-K*dL.  With two layers, assumed not to have leaves preferentially located relative to one another (here’s the assumption of randomness, of not clumping), the fraction not occluded is the product of the independent probabilities, or (1-K*dL)2.  We can continue to N layers, with N large, to get the probability of non-occlusion as

 

                                                                      (AIII.2)

 

The rightmost expression just indicates that the total leaf area index in all the layers is our original consideration, L.  Now take the limit of this as we make many fine layers (Nà ∞).  There is a well-known relation:

 

                                                                                  (AII.3)

 

Identifying x with K*L, we immediately get the penetration fraction as exp(-K*L) and the occlude (green-leaf) fraction as in Eq. (AIII.1).

 

            This formula assumes that leaves (or other objects we are interested in) are not preferentially clumped (or dispersed, which is negatively clumped).  We know this is not true.  Leaves are clumped onto branches, and branches onto bigger branches, etc.  This clumping introduces a serious bias into our estimates of leaf area.  For positive clumping, we call the error the sieve effect: the object (the shrub) is a network of dense areas with intervening holes or sparse areas.  We really want to know the average LAI over this system, but we can only average the green fraction, fgreen.  The relation of fgreen to LAI is nonlinear, so that the function (LAI) of the average (fgreen) is not the average of the function (LAI).  A simple numerical example wil suffice.  Suppose we have equal areas, one with LAI = 3 and one with LAI = 0.  The true average LAI is 1.5.  The average fgreen is 0.5*[ (1-exp(-0.5*3)] + 0] = 0.5*[0.777+0] = 0.388.  This corresponds to LAI = -2*ln(1-0.388) (if we invert Eq. AII.1), or LAI = 0.98.  This is a serious understimate.  You can appreciate the problem in more general cases: no matter how huge an LAI goes into the one half of the area, fgreen will always be no more than 0.5, corresponding to an “average” LAI of 1.38.

            One putative solution to the dilemma is to determine LAI on very small regions so that we are never averaging disparate pieces.  The smallest region would be the individual pixel.  This fails for clear reasons: each pixel shows up either as leaf (fgreen=1) or nonleaf (fgreen=0).  At fgreen=1, LAI is indeterminate (infinity, in principle).  We need to pick regions large enough that fgreen is never 1, or rarely so (in the rare cases, we set LAI to a practical upper limit, typically in the realm of 4 to 10).  The regions have to be of a width that covers a number of leaf lengths.  This already puts us in the position of sampling both the finest clumps and some of their interspaces.   We require a way to estimate clumping and to correct for it.

            We cannot describe this clumping a priori, without a very detailed knowledge of the growth habits (this has been done, as a tour de force we want to avoid, esp. to avoid redoing for every species and growth condition).   We have a few options.  First, we can disassemble a shrub and get the real leaf area, and then compare it to the turbid-medium estimate.  This is the gold standard and needs to be done at least once to test the method.  However, the discrepancy will be a function of the scale on which we analyzed the image (how big our tiles are), and we don’t know the inherent scale of clumping in the shrub without an incredibly tedious task of plotting in 3-D where every leaf occurs.  Second, we can analyze the apparent leaf area, A(S), for a variety of tile sizes, S, and then extraopolate to zero tile size.  This is the approach I have taken.

            One can readily appreciate that A(S) decreases with increasing S.  It also reaches an asymptote, where it does not decrease any more with increasing S.  We want to extrapolate to S=0, but there are many possible functional forms one might consider.  Linear extrapolation should not work, because A(S) has a flat asymptote yet curves upward at small S.  A quadratic fit is more attractive but has similar problems in the end.  I decided to make a theoretical model of clumped vegetation, find out how A(S) behaves, and fit an appropriate empirical form.  The model should be fairly universal, so that the empirical form should have general validity.  The clumping model is given in Appendix IV.  I present here the best empirical fit, which is simply

 

                                                                                                     (AIII.4)

 

Here, A is the asymptote, the apparent area at very large tile size, and b and c are constants.  We wish to extrapolate to A(0).  The practical way to use Eq. AII.4 is to rewrite it as

 

                                                                         (AIII.5)

 

                                                     (AIII.6)

 

A linear regression of ln (A(S)/ A-1) versus S gives ln b as the intercept.  From b and the asymptote A, one can simply calculate A(0). 

            One runs image analyses at 4 or more tile sizes, one at a very large size to give an estimate of A and three more at small tile sizes to use in the regression.  The large tile size should still allow several tiles to fit across the original image, though this is not really a restriction.  The other 3 (or more) values of S are reasonably chosen in a geometric series, each succeeding one twice as large as the preceding.  The smallest tile size must be chosen to span several leaf lengths, to avoid the problem of obtaining tiles that are all leaf and thus of indeterminate (potentially infinite) leaf-area index.  For this reason, the analysis program requests the user to specify the pixel size of the average leaf, and a multiple of this to use for the smallest tile size.

            This theory of clumping corrections may be useful in more general situations, such as inferring leaf area index in hemispherical photos in mixed canopies.  Moreover, the ability to estimate the clumping factor will be useful in models of light interception and photosynthesis (Gutschick, 1991).  Note that the parameter c obtained in the regression above is a direct measure of the physical scale on which clumping occurs (as 1/c, multiplied by the pixels per cm in the image).

 

Appendix IV.  A theoretical model of leaf clumping and its effect on apparent leaf area

 

            There are a number of ways, more or less complicated, to model clumped vegetation for the purpose of estimating the effect of clumping on light penetration, and, for the present inquiry, estimating the correction for clumping when inverting the problem to solve for leaf area from the degree of light penetration.  I examined a number of models and found that a simple one sufficed.

            Consider a view area divided into square pixels.  This squareness is no mathematical restriction, in the limit of small squares that can cover any area.  Now consider clumps of vegetation, also square (again, not a theoretical limitation).  Each clump has a size Lw, as the length of its side in pixels.  The corresponding area of a clump is simply Ac=Lw2. (All the clumps are the same size; I believe the same results will be obtained with clumps of varying size, but I haven’t done the full test.)  Each clump has a leaf area index Lc, and a corresponding probability of light penetration (non-occlusion, in the terminology used earlier), x=exp(-0.5*Lc).  The factor 0.5 applies to randomly oriented leaves.

            Now place a large number of clumps, Nc, at random centers all over a large view area, of total area A.  The average leaf area index, LAI, will be the sum of the clump areas, NcAc, multiplied by the leaf-area index within each clump, Lc, and divided by the total area, A:

 

                                                                                                           (A.IV.1)

 

I enforce periodic boundary conditions: a clump that extends over one edge of the large area reappears at the other edge.  This effectively considers the large area as a repeating pattern, and it eliminates edge effects.

            For a practical computation, we can choose a variety of values of average LAI, and each one achieved with clumps of different density (denser clumps = greater clumping factor, at the same LAI and linear size of clumps).  We choose a large area, A, to simulate, big enough to have many clumps but small enough to have fast computation.  I chose an area of 1 million pixels, 1000 on a side.  The size of the individual clumps was chosen to be a small fraction of this.  I chose 41, giving the clumps a lot of possible degrees of overlap with each other.  The odd number 41 was chosen so that each clump has a center with an integer pixel number (21,21). 

            Here is where the clumping enters: each of the Nc clumps has its center location chosen at random.  Some clumps will be isolated from others (less and less likely, as Nc and LAI increase).  Others will overlap each other, perhaps multiply.  At each pixel, I compute the probability of light penetrating, as x to the power n, where n is the number of clumps that have some part on this pixel.  This “population” of  clump parts at each pixel is easily tabulated.  The average penetration probability, fpen, over the whole area, A, is simply the sum of the pixel penetration probabilities, divided by the number of pixels.  Then, 1-fpen is used as is the probability of hitting a leaf, fgreen, in the program.  I use this to calculate the apparent LAI,

 

                                                                                                     (A.IV.2)

 

The ratio of LAIapp to true LAI is a measure of the clumping effect.  This ratio is the same as A(S)/A(0) in the terminology of Appendix III.

 

            Some results:

 

            1. When the tile size for analyzing such a clumped image is less than the size of the clumps, the error is linear in tile size, S.  I used the clump size Lw =41, as above, and clump leaf area index Lc = 1, and a total true LAI = 2:

   Tile size       20         10           5            2          1

   LAIapparent  1.8602  1.9245  1.9610  1.9868  1.9987 (exact…after rounding of Nc)

 

            2. At tile sizes larger than the clump size, the effect reaches an asymptote, as we know:

   Tile size       40           80        160           

   LAIapparent   1.7432  1.6358  1.5949

 

            From these 2 results, generalizing, I derived the exponential form of Eq. AIII.4.

 

            3. A separate line of inquiry, not immediately relevant to the current empirical analysis (or so I believe), is how the clumping effect varies with clump density and clump LAI, Lc.  It turns out that the clumping effect decreases as the density of clumps increases, as long as the placement of the clumps is not itself clumped (it is random).

             I developed a closely-related model, semi-analytic in form, for clumps placed randomly in thin layers, as described in the arguments for the turbid medium model:

   Each clump has area Ac, or, as a fraction of a large area we’re considering, it occupies a fraction Ac/A of total area A. 

   Each clump has a real leaf area index Lc, and thus a penetration fraction through it alone of exp(-0.5*Lc) = = x.

   Consider, as with the argument for random placement of individual leaves, the penetration through the first layer, with enough clumps to give a fractional coverage of area Ac/A.  Light will penetrate in two regions: where there are no clumps, and where there is a single clump (the clumps are so “dilute” that they never overlap).  Then,

   fpen = (probability of no clumps)*(pen.=1) + (prob. of 1 clump)*(pen= x)                   (A.IV.3)

          = (1-Ac/a) + (Ac/A)x

Now add a second layer.  The probabilities for finding areas with 0, 1, or 2 clumps in the line of sight are, respectively,

    f0 = (1-Ac/A)2                                                                                                          (A.IV.4)

    f1 = (Prob clear 1)(Prob occluded 2) + (Prob occluded 1)(Prob clear 2)

      à 2 (1-Ac/A)(Ac/A), since we’ll put equal areas in each layer

    f2 = (Prob occluded 1)(Prob occluded 1) = (Ac/A)2

   It’s easy to show that these sum to 1.

Now add a third layer:

    f0 = (1-Ac/A)3                                                                                                           (A.IV.5)

    f1 = (Prob. clear 1)(Prob clear 2)(Prob occluded 3) + 2 other permutations

        = 3 (1-Ac/A)2 (Ac/A)

    f2 = 3 permuations with only 1 layer clear

        = 3 (1-Ac/A) (Ac/A)2

    f3 = (Ac/A)3

   Again, these sum to 1 explicitly.

  In general, with N layers:

    f0 = (1-Ac/A)N                                                                                                          (A.IV.6)

    f1 = N (1-Ac/A)N-1 (Ac/A)

    f2 = [N (N-1)/2] (1-Ac/A)N-2 (Ac/A)2

    ….

    fn = (1-δ)N-n δn, where is the binomial coefficient and δ = Ac/A.          (A.IV.7)

    and                                                                   (A.IV.8)

   There is probably an analytic form for this sum, but I found it easy to do numerically, in the program overlap.f (wombat, ~/amrita). 

    Note that the number of layers, N, determines the actual leaf area index,

      LAItotal = N (Ac/A) Lc,

    or, if we specify LAItotal, we must set N = LAItotal/[N (Ac/A) Lc].  This is just Eq. (A.IV.1) again.

  Now that we can calculate fpen for the whole scene, we can calculate the apparent LAI as -2 ln fpen.  If we divide this by LAItotal, we get the composite clumping factor.


   Results

    Ac/A     Lc    LAItotal  à  fpen           LAappar    fclump,indiv fclump,composite

        0.01     1.      0.01         0.99607    0.0079    0.78694    0.78849 (only n=1 contributes)

      0.01     2.      2.             0.53040   1.2683     0.63212    0.63413 (contribs. to n=6)

      0.2       2.      2.             0.50875   1.3516     0.63212    0.67580 (contribs. to n=5)

      0.4       2.      2.             0.55284   1.1659     0.63212    0.72872 (contribs. to n=2)

      0.9       2.      3.6           0.18584   3.3657     0.63212    0.93493 (only have 2 layers)

          (and same result for LAItotal = 7.2, twice as many layers)

   So, the limit of dispersed clumps, giving no change in clumping factor, is correct (first result).  For small clumps of high LAI, assembling them effectively makes no change in clumping factor.  For larger clumps of high LAI, assembling them into a high total LAI actually improves the clumping factor (makes it closer to 1, by about 6% to almost 50%!).