Document to collaborators on a method to extract estimates of plant leaf area from image analysis

**1. Apologia pro vita sua, or something very
similar: the quick story behind developing the method of leaf-area estimation
by image analysis, and why it took so long:**

These data from creosotebush on the Mojave Global Change Facility have certainly been the acid test - every possible limitation of the method of leaf-are analysis had to be discovered and addressed:

* Leaf colors for nominally live leaves that matched the color of much soil (just discovered today; pecans and unstressed Larrea were so much easier!)...and leaves that are litterally partly dead but have to be counted as live when put through analysis, with a standard leaf-area meter.

* Concerns about camera resolution, chromatic aberration (in an earlier camera), and white balance (some cameras have pitfalls about reverting to defaults)...and about color shifts with under- or over-exposure (baseless, but it took a long time to verify this)

* Leaf clumping that requires new methods of estimating corrections from the basic model for converting green fraction in view to leaf area index (again, pecans were trivial in comparison). These data are an ideal test ground for how clumping affects resolution of "hidden" leaves.

Earlier studies on our own Larreas brought out the raft of other challenges, so that we now know how to take the pictures for maximum reliability and how to write efficient, user-friendly programs. We also learned to:

* Take pictures in a reproducible geometry for every repeated measurement.

* Avoid the biggest geometric distortions in leaf views, and a number of potential problems in

automatic exposure in the camera.

* Deal with the problem of mixed pixels - part leaf, part non-leaf; this demands special color

indices (linear differences).

In addition, I had to develop novel algorithms of general utility (for deciding in a fast computation if a point lies inside a polygon, and for removing the effects of leaf clumping with a theoretically sound algorithm based on detailed simulations). I also wrote up useful guides to color theory, in considerable detail, guides that we use in other research.

In the end, it appears that we have a method to analyze leaf area under a range of plant structure, leaf optics, stress conditions, lighting conditions, and camera operation quirks.

**3. Results of
calibrations of the image-analysis method with MGCF**

** data on Larrea leaves, from a shrub
disassembled progressively**

This is the cut to the chase.

The Denude team (Fig. 1) pursued a large and complex operation to measure leaf areas in a manner that gave images suitable for comparison. They removed 24 branch sections, one at a time. Each such branch was photographed and then its leaves were removed for analysis by an optical scanner, the Scion platform. The remaining whole shrub was also imaged. The geometry of views was highly reproducible, simplifying the analysis, and in each image a physical scale of distance (a ruler0 was included.

Ideally,

1) The area of each branch obtained by image analysis should match the area measured by taking all the leaves off and scanning them.

2) The area of the whole shrub at each stage of disassembly can be reconstructed in reverse order, as the sum of branch areas remaining. After the last branch (number 24) is removed, one knows that the area on the shrub before this removal was the area on branch 24. Before branch 23 was removed, the whole shrub had the area of branches 23 plus 24, and so on.

This
ambitious program could not be followed entirely. It takes so long to pick off individual leaves
and place them in nonoverlapping fashion on the scanner platform that the
backlogged branches were at risk of dehydrating and or molding. A number of branches (12 of the 24) were
stripped of leaves that were only weighed, not scanned. I used the weights to estimate leaf areas,
based on the mass-area regression obtained for leaves that were both weighed
and scanned. (This regression, with a
forced intercept of zero area at zero mass, is area(cm^{2}) = 54.03*mass(g).

The

1) At what degree of color change should a leaf be considered dead? In color space alone, the gradation is continuous.

2) Many leaves have both green centers and red margins. Does one count the whole area of a leaf as live, if the majority is green?

I was not present at the destructive analysis to make these judgments. In my image analyses, I chose to cut off the color space for live leaves at the visibly gray, not reddish hue.

I chose also to count the entire area of a leaf, up to such hue. This is not a fully satisfactory solution (Appendix II.2), but it is the best compromise. I can provide more theoretical analyses of likely biases, if desired. This solution to the dilemma requires the actions described in Appendix II.2:

1) Choose the color space that defines live leaves, including their dying margins. The concept of a color space is detailed in Appendix I.

2) Find the color space occupied by soil pixels, and in particular that part that overlaps the “live leaf”color space.

3)
Determine what fraction of “live leaf” area is not counted when the color space
for live leaves is replaced by the color space that strictly excludes soil
pixels. Do this by analyzing a photo or
a section of photo that has branch and background but no soil. In the

A further consideration is that, when leaves are so very
marginally green, any modest change in the color quality of lighting can shift
recorded color of any pixel from values indicative of leaf to values indicative
of non-leaf (soil, stem), or vice versa.
This happened clearly on 7 images of the whole plant. Compare Fig. 5 with Fig. 2 to see the shift in
color and in lighting geometry (diffuse vs. direct sun). Late or early sunlight is notably more red
than is midday sunlight; overcast skies give sharply increased blue. One must commonly accept whatever natural
lighting conditions occur, but the solution is to adjust the white balance of
the digital image for each significant change in lighting conditions. Alas, the first 7 images of the whole plant
were taken in light overcast conditions, as indicated by the lack of branch
shadows…and by the shift in color of a white cloth tag as well as of the blue
background (assessed visually and by computing regressions of G/R and G/B). I could not derive a reliable *post facto* correction, so I did not use
these images in the final analysis.

Accepting the limitations and solutions
above, I processed all the images of branches and all the images of the “whole”
shrub. All areas estimated by image
analysis were multiplied by 1.45 to account for the dying margins (item 3 a few
paragraphs earlier). The regressions are
shown in Figs. 6and 7. They are
reasonably satisfactory, with r^{2} values of 0.80. The outliers have not been reanalyzed in more
detail (for further color corrections), given that the most extensive
corrections have been applied and insufficient information exists on lighting
conditions to make the most meaningful further corrections. With accurate white-balance control in the
imagery, I would anticipate a major gain in r^{2}, and with greener
leaves, even more (to 0.95? a data set on green leaves with accurate
whole-plant and branch area determination is lacking). Fig. 8 shows an image of an isolated pecan
tree with the background sky color-shifted to white; leaves are totally
unambiguous. Fig. 9 shows a row crop,
arugula, which introduces new considerations about leaf angle and
(non)-randomness of leaf placement, to be discussed in Appendix II.3.

One final
note on the shrub imagery is that ground cover vegetation shows up in the
analyses. The intercept is nonzero, in
the regression of image-analysis area as a function of directly-measured area. Close inspection of the images shows fine
green elements, which cannot be identified in the images but which are almost
certainly *Cryptantha* or cryptogamic
crust or both. In future imagery, it
would be useful to cover the soil by the shrub base with a blue fabric
background, as well.

For the

**4. Overview of the
user operations (details will be in a manuscript and also**

** in a full manual)**

*A. Taking the images in the field*

Considerations:

“0.” A high-quality digital camera is easiest to use. A film camera is usable, if one

is prepared to scan images and perhaps rectify exposure errors after the fact.

(We used a film camera for years, but now prefer digital.) If you use a film

camera, be sure to use the same kind of film each time (for reproducible color

balance) and to use a reliable photofinisher (ditto). I can send you scanning

instructions for film images.

1. The geometry must be known and also reproducible

That is, you must know how many pixels represent 1 cm at the leaves, so we can

convert from pixel coverage to cm2 of leaves

The view angle must be reproducible, so that the whole shrub is seen, and each

part is seen at the same distance each time

It is possible to analyze images that violate this reproducibility, but the analysis

time is very much longer (10 to 100 times), involving trips back to the field.

__Action item:
install survey markers__ at each shrub to be imaged repeatedly, and

__have
notes about camera placement__ (height above the ground, pointing angle)

__and lens
focal length used__.

2. The viewpoint for each shrub must be chosen so that only the one shrub is imaged,

if at all possible (it is tedious and sometimes dubious to mask out the other plants

digitially, if they show up behind, around, or in front of the desired shrub).

For isolating a shrub, including from soil background, it is usually desirable to

__impose a
background of distinct color__, a color that cannot be confused with any

plant part. We try to use a very saturated blue (pure blue, not baby blue).

__In
pathological cases__, where stressed/ dying areas of live leaves are as brown
as

some soil
areas, __you will also want the blue background to cover the soil__.

3. Even exposure of leaves make the analysis a bit more accurate. If an image is

overexposed or underexposed, pixels with leaves in them may look so bright or so dim that

they cannot be recognized as leaves in the image analysis program.

__Use a camera
mode with accurate exposure at the leaves __- if you can choose a

__spot-metering
mode__, do so. You don’t care if the
background is exposed normally,

only the leaves.

Sometimes the camera must be lifted above one’s head to take the picture. If you

must do this,
be sure to __block the camera viewfinder__ (eyepiece) with the attachment

provided with the camera (certainly on a single-lens reflex). If you don’t block this, light

will enter the finder and trick the camera into thinking it’s lighter than it is, and

the image will be underexposed, sometimes severely.

__Minimal
shadowing is desired__. It adds
difficulty or uncertainty for the computer analysis

to try to distinguish leaf from non-leaf in deep shadows. We try to shoot in midday,

within two hours before or after noon.

4. __Reproducible
color balance is critical__. The color
balance of natural light varies greatly

over the day and with weather conditions - sunrise and sunset give reddish light,

overcast gives very blue light. The color shifts are accommodated in our eyes but

not reliably in images (even digital images with automatic white balance).

The reliable way
to adjust for shifts in light quality is to use __custom____ white balance on a__

__digital
camera__ (the analogous thing can be done for film images, with a bit more

effort and a modification of our computer program that I have not done to date).

The grey-card method is explained below. If you don’t do this, it will take a goodly

effort to correct for color shifts in individual images.

5. The ideal case is
to __have all parts of the shrub within a narrow range of distances from the__

__ lens__.
Otherwise, the parts that are near look artificially bigger than the
distant parts and

their contributions to leaf area are spuriously increased.

The ideal way would be to use a long (telephoto) focal length and move just far enough

away to see the whole shrub and not much more. This is often impractical - to see all of a big

shrub, one may have to place the camera so far away that one must stand on a ladder…which

one might not have. Use judgment, and use the longest practical focal length.

6. Leaves ofter show up as only a few pixels across. This makes many pixels at the edge of

the leaf occur as mixed pixels, part leaf and part background. We have made the

classification of mixed pixels as reliable as we can, but it is not as good as classification

of pure
pixels. Thus, __we want leaves to be as
many pixels across as is practical__.

One part of this is getting the highest resolution possible

* a __camera
with sufficient megapixels (and true
resolution ____à
a good lens__),

* but also __storing images at the highest
JPEG quality__ setting. Images at
lesser resolution

look OK to the eye but they have blended fine details among pixels,

* and finally, using only optical zoom, never digital zoom (which only interpolates cruder optical images and loses information in the broader scene).

As a result of the considerations above (which were derived after failures), we proposed the

following protocol:

1. Pre-field preparation.

A. Assemble equipment that will be needed:

Camera, with enough memory space left for number of images anticipated.

Camera manual, so that one can have complete mastery of functions needed here,

including custom white balance and spot-metering.

Grey card for setting white balance.

Blue backdrop that can be placed under and around shrubs (not critical unless

some shrubs have undergrowth that can’t be put out of view by judicious choice of

viewpoint…or if leaves are so stressed that they almost merge into soil color, esp.

at leaf margins).

Rigid ruler to determine pixels per cm in the scene.

For the first setup: survey markers, to mark the locations for taking each shrub’s

picture.

Field notebook, including:

Notes from previous imaging session, in which you recorded the viewpoint and the

lens focal length you used for each shrub (obviously not possible the first time).

B. Schedule the trip properly:

Be ready to take pictures between about 10 AM and 2 PM, local solar time.

2. Set up a unique viewpoint for each shrub - one that shows the shrub and only the shrub,

has ready access, does not cause the camera operator to throw a shadow on the

leaves.

Record the location, the camera angle, and the lens focal length that had to be used.

3. Adjust the white balance of the camera to that of a grey card (custom white balance). Do

this at the beginning, and every hour or when the light changes.

Follow the instructions for the camera; shoot an image with the grey card and use this to

set custom white balance.

4. Set the exposure mode to automatic mode. Be sure that no one has left on any exposure

compensation shifts from the previous use. If you can set the aperture (Av mode on

most cameras), use a moderately small aperture (say, f-stop of 8 or smaller, meaning the

number 8 or larger - 9.5, 11, 13, or even 16). Use the spot-metering option, on the leaves.

4. Take the pictures of each shrub.

B. Find the proper viewpoint for the shrub.

C. Set the proper focal length for the view (for this particular shrub):

Generally, this is mild telephoto (ca. 80-100 mm equivalent 35mm-camera focal length;

about 50-60 mm on digital SLRs that have lens-multiplier factors near 1.6 because of

use of small CCD image-capture chips).

This may be impractical in some cases- esp. to see all of a big shrub, one may have to have

the camera tooo far away; use judgment.

In any case, it’s best to always use the same focal length for a given shrub each time;

record the length you used the first time and use it every succeeding time.

E. Take the picture.

If you have to hold the camera away from your eye (say, you have to hold it above your

head to image the whole shrub), then you must block the viewfinder with the camera-

supplied blocker (usually on the carrying strap; sometimes there’s a shutter you can flick

closed on the viewfinder). This is necessary because sunlight will enter the viewfinder

and trick the camera into thinking there is more light that the lens will see; the camera will

reduce its exposure and you will get a dark image, possibly an unusable one.

F. Check that the image is OK- that it has a reasonable image density, and that it shows the

complete shrub, and that color is not odd.

*B. Protocol for image analysis on the computer*

From here on, I’ll assume that you have digital images - straight from the camera, or scanned in from film images (at high resolution and with color management - I can generate instructions if you pursue this method). The images must be TIFF images. Many cameras store JPEGs only, or RAW. Use your software (supplied with the camera, or freebies like xv, GIMP, or IrfanView, or proprietary stuff such as PhotoShop) to make the conversion, if necessary.

It is OK if the file comes out with LZW compression.

Your first task is to find out the colors of pixels that qualify as live leaves (and dead leaves, if you also want those). I have found that the best way to describe the allowed colors is to find a representative number of pixels that are definitely leaves, and find their values of G-R and G-B (the primary colors R, G, and B are recorded in a digital image as values from 0 to 255 each, at least in 8-bit color; 255,255,255 is pure bright white, 0,0,0 is pitch black, 255,0,0 is pure red, 100,100,0 is a medium yellow, and so on; it might help to look at my writeup on color theory on the Web, at http://biology-web.nmsu.edu/vince.

All the allowed values of G-R and G-B will be found to lie in a cluster. We can define a simple 4-sided polygon in the space of (G-R, G-B) that encloses this cluster of points. Appendix I shows an example. The program has a very rapid way of using the specifications of the polygon to determine if any other value of (G-R, G-B) fits inside the polygon (Appendix I again).

Basis steps are:

1. Open the image in an image viewer that can report RGB values of a selected pixel. For example, in PhotoShop, on the toolbar, click on the eyedropper; on the upper toolbar, click on info; move the eyedropper so that its lower left end is on the pixel you chose. The RGB values, and the x,y coordinates, will be displayed in a dialog box.

2. Wander around the image, selecting pixels known to be leaf, and record their values of R, G, and B (and x, y, so you can relocate them for double-checking later). Enter these values into a spreadsheet, where you will set up columns to compute G-R and G-B.

3. Do the same for pixels representing the other cagegories you do NOT want to have counted as leaf - stem, background, soil, dead leaves. I make sections for each of these.

4. Plot the G-R, G-B values in some plotting package. I find the shareware package DPlot extremely convenient. Give each category a distinct color - e.g., green for live leaves, gray for stem, red for soil, … You will see distinct clusters. The results are detailed in Appendix I; see esp. Fig. AI.1.

5. Figure out a 4-sided polygon in this plot that will enclose live leaf only (or almost always; one might need to accept a few overlaps with other categories, in rare cases…such as the Nevada data, where it is really a judgment call to say where live leaves end and dead leaves begin!). This must be non-re-entrant (no interior angle can exceed 180º = no angular “boomerang” shape). Record the coordinates of the four vertices for use in my image-processing program.

Image
analysis *per se*: the program is set
up for batch processing of any number of images in sequence (up to 1,000). Of course, you can do single images.

You can run the program in any of several ways:

1. In a DOS window, invoking the executable LA_analysis(.exe).

A. You need to install LA_analysis.exe in the same directory as your images, or,

alternatively, in your path (edit C:\autoexec.bat to do this).

You will also need to install the compiled C program, scanline3.exe, in the

same directory (or in your path, again). I recommend using just the executable.

I can give you the source code, but you need the complete TIFF libraries, and

there is some customization to be done. It took us a couple of days to figure this out!

B. If you have an earlier version of Windows, with DOS windows that are not

scrollable, it will be difficult to check your output. I heartily recommend getting

a copy of Commander11 (cmd.exe) and installing it; it provides a scrollable DOS

window.

2. From a Fortran compiler, such as Compaq Visual Fortran, which has a graphical user

interface. I can supply the Fortran (and C) source code.

The basic operation is to enter line commands, including the names of the TIFF images to be processed, the pixels per cm in the image, etc. There are two ready ways to do this:

1. Manually: invoke the program with the command “LA_analysis” and then respond to each

prompt from the program. I will explain what each one means.

2. From a file in which you composed all the commands (and can save, esp. saving the

effort of retyping when you change processing options). Create a file, as in Notepad (a

plain ASCII text file, unformatted only). In the DOS window, use an input redirect:

> LA_analysis < myfile.txt

This will read in the contents of myfile.txt as commands.

You can do an output redirect, too:

> LA_analysis < myfile.txt > myresults.txt

The file myresults will have all the printouts from the program; you won’t see them on

the screen (I don’t know if DOS has a “tee” utility that lets output go to both the screen

and a file, as does UNIX).

Here’s the deal, prompt by prompt:

1. Enter the name of the file where the RESULTS will be stored. Be sure this is not the same name as any other file you want to keep, because that file will be overwritten. In Windows, I usually give it a .txt extension, since Windows is so brain-dead that it doesn’t have the ability to open a file of arbitrary name with an editor.

The program will record the date that you did this analysis.

2. Begin a loop, to repeat as often as you want, once for each successive image:

A. Enter the name of the TIFF file (64 characters or less, in length). The program will open the file for reading, and then will invoke the C program, scanline3, to create a “raw” file (“scratch.out”) that has only the RGB values of each pixel, concatenated into one long line for each line of the image. This will show up in your directory. The program will proceed to read this new file to find the (x,y) size of the image, which it will echo to you on the screen. It will have the qualifier “including border areas.” You can blank out areas you do not want analyzed; see Appendix II on special handling of images.

B. Set the processing parameters.

If this image is the first one to be processed, you have to set each
parameter. If it is the 2^{nd}
or later, you have the option of changing none, one, or many parameters; the
program prompts explain these choices.

__The color space for live leaves__: Enter
the coordinates (G-R, G-B) of the four vertices found to delimit live
leaves. These must be entered in
clockwise order, starting from the leftmost (lower or upper, it does not
matter). The program will ask you to
repeat the entry if you don’t follow this convention. The program will report the direction cosines
as angles as it prepares its internal criteria.

__igreencut__: Enter the
minimum value of green (G) that a pixel may have and still be considered as
leaf. In deep shadow, when one uses some
films, there is a color shift, and a neutral gray can come out green by
accident. I often use 15 for this
parameter. For digital images, I use 0
(there is no bias between R, G, and B channels at low light).

__maxgreen__: Enter the maximum value of
green for a leaf. This is used to
prevent specular highlights counting as leaves, in pathological cases (mostly
they look plain white, which does not nearly qualify as leaf, but rarely this
happens). I set this to 220 (out of 255,
this is really bright!).

The program will then read all the pixel RGB values, and will use the
criteria above to identify each pixel *tentatively*
as leaf or not leaf. I will explain the
“tentatively” part shortly.

__enx1cm__: Enter the number of pixels that span 1 cm in the
image. This item is used to convert leaf
area in pixels to physical leaf area in cm^{2}. Of course, in a real image, the scale varies
from center to edge; pick the middle.
Also, the scale varies for leaves closer and farther than average from
the mean distance to the camera. Again,
pick the middle (this is why it is important to use as distant a view as is
practical, to get the least variation in leaf-to-camera distances among all the
leaves). If *enx1cm* is entered as a negative number, it is interpreted as pixels
per inch, not cm, and then it is converted to pixels per cm.

__leafsizepixels__ and __startingmult__: The full theory behind these parameters is
noted in the Appendix III on turbid-medium theory of image analysis. The program has to analyze the image in small
sections or “tiles,” and add them up.
The smallest tiles have to be at least several times as wide as the
average leaf, in pixel count. You
specify this smallest tile size by entering the average pixel span of leaves
(the ones viewed flat, with maximum dimension) and the multiplier of this to
use (say, 2, to get the smallest tile as twice this size).

__minside__: Enter the smallest partial tile to use in analysis. For an arbitrary image, a given tile size is
unlikely to cover the image exactly with an integral number of tiles. There are also complications when complex
areas are blocked out (see below). In
either case, one may end up with tiny “orphan” regions that cannot be analyzed
reliably. I use minside = 10, so that an
area of less than 100 pixels is not used.

__nnotgreen__: Enter the maximum number of adjacent pixels that can
be non-leaf, for a given pixel to count as leaf. Here is the explanation of “tentatively” from
above. Particularly with film images,
one can get noise pixels that are soil, stem, or the like but are green by
error. In order to discount these
isolated pixels, I introduced this parameter.
A real leaf pixel will always be connected to other leaf pixels,
commonly on all sides, but usually on 2 or 3 other sides (up, down, left,
right) even if the pixel is one the edge.
This parameter is not significant for digital images (at normal ISO), so
I set it to 1, or maybe 2.

At this time, the program will complete the analysis. It will analyze the image for leaf area index in a series of tesselations or tiles covering the whole image. In each tile, it will estimate the leaf area. It will then use the pixels per cm information to convert this to real leaf area. Note that the turbid-medium model (Appendix III) accounts for leaves hidden behind other leaves. It does so only roughly at first, because clumping of leaves (densely here, sparsely there) violates the assumptions of the theory. The program corrects for almost all of this bias by redoing the analysis at several tile sizes and then extrapolating to zero tile size (Appendix III). The program reports the estimates of leaf area at each tile size, plus the final extrapolation. It also writes the information to the file you specified at the beginning.

If this in not the 1^{st} image, you will be asked only for
changes in the parameters.

**Appendix**** I.**

*1. Observations that leaf colors cluster
well*

Many individual images have been analyzed. Particularly illustrative is a composite analysis of three separate images that differed considerably in exposure of the leaves, ranging from dark (underexposed) to normal to light (overexposed). Figure AI.1 shows the results, which will be commented upon more extensively in the next paragraph. This comparison was part of the inquiry into whether or not exposure affected the pattern of leaf colors, i.e., the polygon in (G-R, G-B) into which the leaf pixels fell. The comparison revealed some useful regularities (exposure hardly matters), and some pitfalls (dead and live leaves grade into each other and one must make a judgment call about the dividing line). It also revealed that the space (G-R, G-B) is very effective, more so than simper alternatives such as (G,R) and more so than complex alternatives such as dividing G-R and G-B each by the total value, R+G+B. The latter alternative was considered for the potential of removing bias from overall exposure level. However, 2 problems arose: 1) this normalized color space actually mixed the pure live-leaf pixels with the pure pixels of other items (stem, soil,….); Fig. AI.2; 2) inherently, the new coordinates are nonlinear in R, G, and B; consequently, mixed pixels (say, half leaf, half stem) will not be recorded with proportional probability as being either element; the brighter part of the mixed pixel will dominate, for one. We therefore stuck with the use of (G-R, G-B). Live leaves are quite distinct in this space, even for Larrea plants. Live leaves are higher in G-R than dead leaves, and they are higher in G-B than are stems. The saturated blue background is far away from live leaves in this space. The color of soil pixels can overlap the colors of very stressed leaves or dying portions of dead leaves, calling for special treatment noted in Appendix II. For most other cases, such as healthy leaves of agricultural crops (pecans, in another case we treated), the contrast of leaves to all other elements is extreme. There are no problems in analyzing leaf area in such cases.

Fig.
AI.1. Clustering of color coordinates of live leaves, apart from dead leaves,
stems, and background. See text for an
explanation of terms. Images that are
sources of data are indicated by number (branches 6-2, 4-4, and 16,
respectively with high, normal, and low overall exposure). Two types of data points are not noted in the
legend, which is limited to 16 items: shaded soil (green open stars); soil
shaded by blue backdrop (red X’s)

*2. Effective
computation of whether a point lies in a polygon*

Let it be accepted that leaves have colors that lie inside a polygon in the coordinates G-R and G-B. Now we are faced with computing if any new pixel with some values of G-R and G-B lies inside the selected polygon. There are hard/slow and easy/fast ways to do this. I found a rapid way, undoubtedly known to geometers and people who do image simulations. Simply, one has to determine, for each side in turn, if the point lies on the interior of the side or the exterior. This is simple: construct for each side a direction cosine that is perpendicular to the side and that points to the interior of the polygon (Fig. AI.3). (Knowing which side is interior is easiest if we require that the sides be specified in a regular order; hence, the restrictions noted in Sec. 4.B above.). This can be done once and for all after the polygon has been specified. Then, for any given point, construct the vector from the midpoint of the side and reaches to the point in question (Fig. AI.3). If the dot product of this vector and the direction cosine from the side is positive, then the two vectors are on the same side of the polygon side, and the point can be inside the polygon. I say “can be” because we must verify this for all 4 sides; if the test fails for any side, the point is outside the polygon. Only a few multiplications, additions, and if-statements are needed. Voila.

Fig.
AI.2. Failure of brightness-normalized color coordinates to cluster points
representing live leaves. Data are from
image of branch 4-4, same as used in Fig. AI.1.

Fig. AI.3. Construction of
vectors allowing determination that a point (x,y) is or is not inside a
polygon. Vector v_{dc} is the
direction cosine from the midpoint of side A; vector v_{p} is from the
midpoint of side A to the point in question.
If these vectors have a component in common (dot product is positive),
they lie on the same side (inside) from side A.

**Appendix II. Special handling of images
that require elaborate cropping patterns or that have leaves with dying margins
not so different in color from soil**

*1. Challenges in
cropping some images to isolate one desired shrub *

With some luck and care, each shrub can be imaged separately from others and from undergrowth and any distracting artificial items. This is not always the case. Editing the image by cropping is possible, but most cropping options involve rectangular selections. More flexible is the use of Bezier curves, but these cannot be used as cropping margins in all image processing programs such as PhotoShop. There is a work-around:

1. Use the Bezier curve tool to outline the desired shrub.

2. Make this the selection.

3. Invert the selection (so that everything outside is selected). The way of doing this varies with the program one is using.

4. Use the paint tool to turn this inverse selection (the area to be rejected) a pure white (RGB = 255, 255, 255). My LA_analysis program will ignore all the pure white areas.

*2. Challenges in dealing with images where
live leaves have dying margins nearly the color of soil*

Let us
return to the problem of the stressed leaves.
Assume, as in the _{green}” for
computing leaf area index. The simplest
way is to estimate the fraction, f_{dying}, of pixels on live leaves
that will not meet the new, stringent criteria, and then multiply the raw value
of f_{green} by the correction factor 1/(1-f_{dying}).

There are
then two extra steps in image analysis for such stressed plants. First, one must define a more stringent
live-leaf criterion that excludes soil pixels (within a small margin of error;
no classification is perfect). One needs
to examine soil areas, well-lit and shaded both, in an image-manipulation
program and find maybe 30-100 pixels.
Record the RGB values (and the x,y coordinates), and compute G-R and
G-B. Plot the latter values, as done in
Sec. 4.B. Now, redraw the polygon for
live leaves so that these soil points are excluded. Second, estimate the fraction f_{dying}
of apparently live leaf area that will now NOT be identified as leaf. One can try this visually, though guessing
fractional area coverage is notoriously difficult in complex paatterns. An alternative is to analyze a small part of
the original image known to have only “live” leaves. That is, pick an area with leaves shown only
against the blue backdrop, if one is using this. Crop the original image to this section. Analyze it once with the liberal polygon and
once with the stringent polygon. The
latter will give an estimated area that is smaller by the factor 1-f_{dying}
= A_{stringent}/A_{liberal}.
Solve this for f_{dying}.
It is advisable to do this exercise with areas of the image that include
both well-lit and shaded leaves.

XXXXXXXXX Add this to program instructions!

*3. Other species and
growth conditions: effects of leaf angle*

In
all the analyses done on creosotebush, I have assumed that the leaf
orientations are globally random.
Locally by branch or height, the angle distributions may have some
structure, as Cunningham and others have found (refs.). However, the only effect that matters is that
the mean fraction of flat leaf area presented at the view angle of the image is
½, to use our basic program. This
fraction is expressed geometrically as the average of the absolute value of the
cosine between leaf normal and view angle (camera angle). Let us call this x_{Lv} for future
reference. How accurate is this
assumption for some drastically restricted distributions of leaf angles? It is important to consider that, while leaf
zenith angles (relative to the vertical) may be restricted in distribution, in
all field studies, the distribution of leaf azimuths is very close to
random. (The azimuth is the angle
between the projection of the leaf normal horizontally onto the ground and a reference compass direction, say,
east.) Figure AII.1 shows how xLv varies
with camera view angle for several extreme cases: leaves that are exactly
horizontal, exactly vertical, or at only 45º.
The random zenith case, our standard, is also shown. Note that at the so-called magic angle near
55º the effect of leaf zenith orientation essentially disappears. This view angle is therefore highly
recommended for analyzing canopies of vegetation in which zenith angle varies
by position. An example we have
encountered is deciduous forest in the northeastern _{Lv} very close
to ½.

For plants that nonetheless have problematic leaf angle distributions, there is a solution: choose a view angle and request that I modify the program, or generalize it, for this case. The view angle may be constrained - e.g., for large trees one cannot often get views from above at small zenith angles.

An example
of a problematic case is the leafy vegetable, arugula, grown as a row crop. With images such as Fig. 9 analyzed naively in
my program, the results were not directly usable. The destructively measured leaf area of the 9
plants in the figure was very close to 6000 cm^{2}. In the image of the intact plants, my program
estimated an extrapolated area of 15000 cm^{2}, 2.5 times larger. One factor in the inflation is that the mean
cosine, xLv, is not ½ but close to 1. A
second factor is that the plants were deliberately spaced or overdisperse, vs.
random. Even within the individual plant
there is significant overdispersion.
When the plants were cut and redistributed at random over a slightly
larger area, the overdispersion was reduced and the naïve estimate of area fell
to slightly over 10000 cm^{2}. The
relationship between plant density and apparent LAI (assuming random placement)
is a determinate one and could be used for correcting the areal estimates. More satisfactory for users concerned with
row crops would be a restructured analysis program. With sufficient interest, I could generate
such a program.

**Appendix III: brief explanation of the
turbid-medium analogy, why it must be corrected for leaf clumping, and how it
may be corrected**

Image analysis has to account for two aspects of leaf presentation: leaf angular orientation and leaves that hide each other from the line of sight. We want the total one-sided area of leaves viewed flat-on, while leaves on a shrub occur at a variety of angles. The simple solution to the orientation problem is to assume random angles, such that the average fraction of flat area presented to view, among all leaves, is ½ (Ross, 1981). This assumption is quite good even for rather biased leaf orientations that may be preferential relative to the sun, for example.

The hidden-leaf problem
is also soluble, very simply in ideal cases of leaves whose centers are
positioned randomly in 3-D space. These
leaves (or other objects) form a “turbid medium.” One can readily derive the theory for the
fraction of view occluded, f_{occ} (f_{green} in our case and
in the program) as a function of the total “leaf area index” (leaf area per
unit view area, L or LAI):

(AIII.1)

Here, K is the average fraction of flat area presented along
the view direction; we take it as 1/2 , as just noted. We can derive this form by considering the
leaves as being presented in thin layers, one on top of another, each having
the same total leaf area per view area (same leaf area index, L). We choose very thin layers, so that this leaf
area index is small, dL <<< 1.
Now, the fraction of the view blocked is simply K*dL. The fraction not occluded is 1-K*dL. With two layers, assumed not to have leaves
preferentially located relative to one another (here’s the assumption of
randomness, of not clumping), the fraction not occluded is the product of the
independent probabilities, or (1-K*dL)^{2}. We can continue to N layers, with N large, to
get the probability of non-occlusion as

(AIII.2)

The rightmost expression just indicates that the total leaf area index in all the layers is our original consideration, L. Now take the limit of this as we make many fine layers (Nà ∞). There is a well-known relation:

(AII.3)

Identifying x with K*L, we immediately get the penetration fraction as exp(-K*L) and the occlude (green-leaf) fraction as in Eq. (AIII.1).

This
formula assumes that leaves (or other objects we are interested in) are not
preferentially clumped (or dispersed, which is negatively clumped). We know this is not true. Leaves are clumped onto branches, and branches
onto bigger branches, etc. This clumping
introduces a serious bias into our estimates of leaf area. For positive clumping, we call the error the
sieve effect: the object (the shrub) is a network of dense areas with
intervening holes or sparse areas. We
really want to know the average LAI over this system, but we can only average
the green fraction, f_{green}.
The relation of fgreen to LAI is nonlinear, so that the function (LAI)
of the average (f_{green}) is not the average of the function (LAI). A simple numerical example wil suffice. Suppose we have equal areas, one with LAI = 3
and one with LAI = 0. The true average
LAI is 1.5. The average f_{green}
is 0.5*[ (1-exp(-0.5*3)] + 0] = 0.5*[0.777+0] = 0.388. This corresponds to LAI = -2*ln(1-0.388) (if
we invert Eq. AII.1), or LAI = 0.98.
This is a serious understimate.
You can appreciate the problem in more general cases: no matter how huge
an LAI goes into the one half of the area, f_{green} will always be no
more than 0.5, corresponding to an “average” LAI of 1.38.

One
putative solution to the dilemma is to determine LAI on very small regions so
that we are never averaging disparate pieces.
The smallest region would be the individual pixel. This fails for clear reasons: each pixel
shows up either as leaf (f_{green}=1) or nonleaf (f_{green}=0). At f_{green}=1, LAI is indeterminate
(infinity, in principle). We need to
pick regions large enough that f_{green} is never 1, or rarely so (in
the rare cases, we set LAI to a practical upper limit, typically in the realm
of 4 to 10). The regions have to be of a
width that covers a number of leaf lengths.
This already puts us in the position of sampling both the finest clumps
and some of their interspaces. We
require a way to estimate clumping and to correct for it.

We cannot
describe this clumping *a priori*,
without a very detailed knowledge of the growth habits (this has been done, as
a tour de force we want to avoid, esp. to avoid redoing for every species and
growth condition). We have a few
options. First, we can disassemble a
shrub and get the real leaf area, and then compare it to the turbid-medium
estimate. This is the gold standard and
needs to be done at least once to test the method. However, the discrepancy will be a function
of the scale on which we analyzed the image (how big our tiles are), and we
don’t know the inherent scale of clumping in the shrub without an incredibly
tedious task of plotting in 3-D where every leaf occurs. Second, we can analyze the apparent leaf
area, A(S), for a variety of tile sizes, S, and then extraopolate to zero tile
size. This is the approach I have taken.

One can readily appreciate that A(S) decreases with increasing S. It also reaches an asymptote, where it does not decrease any more with increasing S. We want to extrapolate to S=0, but there are many possible functional forms one might consider. Linear extrapolation should not work, because A(S) has a flat asymptote yet curves upward at small S. A quadratic fit is more attractive but has similar problems in the end. I decided to make a theoretical model of clumped vegetation, find out how A(S) behaves, and fit an appropriate empirical form. The model should be fairly universal, so that the empirical form should have general validity. The clumping model is given in Appendix IV. I present here the best empirical fit, which is simply

(AIII.4)

Here, A_{∞} is the asymptote, the apparent
area at very large tile size, and b and c are constants. We wish to extrapolate to A(0). The practical way to use Eq. AII.4 is to
rewrite it as

(AIII.5)

(AIII.6)

A linear regression of ln (A(S)/ A_{∞}-1)
versus S gives ln *b* as the intercept. From b and the asymptote A_{∞},
one can simply calculate A(0).

One runs
image analyses at 4 or more tile sizes, one at a very large size to give an
estimate of A_{∞} and three more at small tile sizes to use in
the regression. The large tile size
should still allow several tiles to fit across the original image, though this
is not really a restriction. The other 3
(or more) values of S are reasonably chosen in a geometric series, each
succeeding one twice as large as the preceding.
The smallest tile size must be chosen to span several leaf lengths, to
avoid the problem of obtaining tiles that are all leaf and thus of
indeterminate (potentially infinite) leaf-area index. For this reason, the analysis program
requests the user to specify the pixel size of the average leaf, and a multiple
of this to use for the smallest tile size.

This theory of clumping corrections may be useful in more general situations, such as inferring leaf area index in hemispherical photos in mixed canopies. Moreover, the ability to estimate the clumping factor will be useful in models of light interception and photosynthesis (Gutschick, 1991). Note that the parameter c obtained in the regression above is a direct measure of the physical scale on which clumping occurs (as 1/c, multiplied by the pixels per cm in the image).

**Appendix IV. A theoretical model of leaf clumping and its
effect on apparent leaf area**

There are a number of ways, more or less complicated, to model clumped vegetation for the purpose of estimating the effect of clumping on light penetration, and, for the present inquiry, estimating the correction for clumping when inverting the problem to solve for leaf area from the degree of light penetration. I examined a number of models and found that a simple one sufficed.

Consider a
view area divided into square pixels.
This squareness is no mathematical restriction, in the limit of small
squares that can cover any area. Now
consider clumps of vegetation, also square (again, not a theoretical
limitation). Each clump has a size Lw,
as the length of its side in pixels. The
corresponding area of a clump is simply A_{c}=L_{w}^{2}.
(All the clumps are the same size; I believe the same results will be obtained
with clumps of varying size, but I haven’t done the full test.) Each clump has a leaf area index L_{c},
and a corresponding probability of light penetration (non-occlusion, in the
terminology used earlier), x=exp(-0.5*L_{c}). The factor 0.5 applies to randomly oriented
leaves.

Now place a
large number of clumps, N_{c}, at random centers all over a large view
area, of total area A. The average leaf
area index, LAI, will be the sum of the clump areas, N_{c}A_{c},
multiplied by the leaf-area index within each clump, L_{c}, and divided
by the total area, A:

(A.IV.1)

I enforce periodic boundary conditions: a clump that extends over one edge of the large area reappears at the other edge. This effectively considers the large area as a repeating pattern, and it eliminates edge effects.

For a practical computation, we can choose a variety of values of average LAI, and each one achieved with clumps of different density (denser clumps = greater clumping factor, at the same LAI and linear size of clumps). We choose a large area, A, to simulate, big enough to have many clumps but small enough to have fast computation. I chose an area of 1 million pixels, 1000 on a side. The size of the individual clumps was chosen to be a small fraction of this. I chose 41, giving the clumps a lot of possible degrees of overlap with each other. The odd number 41 was chosen so that each clump has a center with an integer pixel number (21,21).

Here is
where the clumping enters: each of the N_{c} clumps has its center
location chosen at random. Some clumps
will be isolated from others (less and less likely, as N_{c} and LAI
increase). Others will overlap each
other, perhaps multiply. At each pixel,
I compute the probability of light penetrating, as *x* to the power *n*, where *n* is the number of clumps that have some
part on this pixel. This “population”
of clump parts at each pixel is easily
tabulated. The average penetration
probability, f_{pen}, over the whole area, A, is simply the sum of the
pixel penetration probabilities, divided by the number of pixels. Then, 1-f_{pen} is used as is the
probability of hitting a leaf, f_{green}, in the program. I use this to calculate the apparent LAI,

(A.IV.2)

The ratio of LAI_{app} to true LAI is a measure of
the clumping effect. This ratio is the
same as A(S)/A(0) in the terminology of Appendix III.

Some results:

1. When the tile size for analyzing such a clumped image is less than the size of the clumps, the error is linear in tile size, S. I used the clump size Lw =41, as above, and clump leaf area index Lc = 1, and a total true LAI = 2:

Tile size 20 10 5 2 1

LAI_{apparent} 1.8602
1.9245 1.9610 1.9868
1.9987 (exact…after rounding of N_{c})

2. At tile sizes larger than the clump size, the effect reaches an asymptote, as we know:

Tile size 40 80 160

LAI_{apparent} 1.7432
1.6358 1.5949

From these 2 results, generalizing, I derived the exponential form of Eq. AIII.4.

3. A
separate line of inquiry, not immediately relevant to the current empirical
analysis (or so I believe), is how the clumping effect varies with clump
density and clump LAI, L_{c}. It
turns out that the clumping effect decreases as the density of clumps
increases, as long as the placement of the clumps is not itself clumped (it is
random).

I developed a closely-related model, semi-analytic in form, for clumps placed randomly in thin layers, as described in the arguments for the turbid medium model:

Each clump has area
A_{c}, or, as a fraction of a large area we’re considering, it occupies
a fraction A_{c}/A of total area A.

Each clump has a
real leaf area index L_{c}, and thus a penetration fraction through it
alone of exp(-0.5*L_{c}) = = x.

Consider, as with
the argument for random placement of individual leaves, the penetration through
the first layer, with enough clumps to give a fractional coverage of area A_{c}/A. Light will penetrate in two regions: where
there are no clumps, and where there is a single clump (the clumps are so
“dilute” that they never overlap). Then,

f_{pen} =
(probability of no clumps)*(pen.=1) + (prob. of 1 clump)*(pen= x) (A.IV.3)

= (1-A_{c}/a)
+ (A_{c}/A)x

Now add a second layer. The probabilities for finding areas with 0, 1, or 2 clumps in the line of sight are, respectively,

f_{0} =
(1-A_{c}/A)2 (A.IV.4)

f_{1} =
(Prob _{clear 1})(Prob _{occluded 2}) + (Prob _{occluded 1})(Prob
_{clear 2})

à
2 (1-A_{c}/A)(A_{c}/A), since we’ll put equal areas in each
layer

f_{2} =
(Prob _{occluded 1})(Prob _{occluded 1}) = (A_{c}/A)^{2}

It’s easy to show that these sum to 1.

Now add a third layer:

f_{0} =
(1-A_{c}/A)^{3} (A.IV.5)

f_{1} =
(Prob. _{clear 1})(Prob _{clear 2})(Prob _{occluded 3})
+ 2 other permutations

= 3 (1-A_{c}/A)^{2}
(A_{c}/A)

f2 = 3 permuations with only 1 layer clear

= 3 (1-A_{c}/A)
(A_{c}/A)^{2}

f3 = (A_{c}/A)^{3}

Again, these sum to 1 explicitly.

In general, with N layers:

f_{0} =
(1-A_{c}/A)^{N} (A.IV.6)

f_{1} = N
(1-A_{c}/A)^{N-1} (A_{c}/A)

f_{2} = [N
(N-1)/2] (1-A_{c}/A)^{N-2} (A_{c}/A)2

….

f_{n} = (1-δ)^{N-n} δ^{n}, where is the binomial coefficient and δ = A_{c}/A. (A.IV.7)

and (A.IV.8)

There is probably an analytic form for this sum, but I found it easy to do numerically, in the program overlap.f (wombat, ~/amrita).

Note that the number of layers, N, determines the actual leaf area index,

LAI_{total}
= N (A_{c}/A) L_{c},

or, if we specify
LAItotal, we must set N = LAI_{total}/[N (A_{c}/A) L_{c}]. This is just Eq. (A.IV.1) again.

Now that we can calculate
f_{pen} for the whole scene, we can calculate the apparent LAI as -2 ln
f_{pen}. If we divide this by
LAI_{total}, we get the composite clumping factor.

Results

A_{c}/A L_{c} LAI_{total} à f_{pen} LA_{appar} f_{clump,indiv} f_{clump,composite}

_{ }0.01 1.
0.01 0.99607 0.0079
0.78694 0.78849 (only n=1
contributes)

0.01 2. 2. 0.53040 1.2683 0.63212 0.63413 (contribs. to n=6)

0.2 2. 2. 0.50875 1.3516 0.63212 0.67580 (contribs. to n=5)

0.4 2. 2. 0.55284 1.1659 0.63212 0.72872 (contribs. to n=2)

0.9 2. 3.6 0.18584 3.3657 0.63212 0.93493 (only have 2 layers)

(and same
result for LAI_{total} = 7.2, twice as many layers)

So, the limit of dispersed clumps, giving no change in clumping factor, is correct (first result). For small clumps of high LAI, assembling them effectively makes no change in clumping factor. For larger clumps of high LAI, assembling them into a high total LAI actually improves the clumping factor (makes it closer to 1, by about 6% to almost 50%!).