Accepted for publication in Weather and Forecasting, March 1998


Roger Edwards

and Richard L. Thompson

Storm Prediction Center

Norman, Oklahoma


This study tests hypothetical correspondences between size of severe hail, WSR-88D derived Vertically Integrated Liquid water (VIL), and an array of thermodynamic variables derived from computationally modified sounding analyses. In addition, these associations are documented for normalized VIL using various sounding parameters; and statistical predictive value is assigned to the various VIL-based and sounding variables. The database was gathered from Weather Service Radar - 1988 Doppler (WSR-88D) units nationwide from cases identified during real-time operations, and consists of over 400 hail events -- each associated with a radar-observed VIL value and a modified observational sounding.

Some parameters are found to increase, in the mean, with larger hail size categories. Specific hail size, however, varies widely across the spectra of VIL, thermodynamic sounding variables, and combinations thereof, with only a few exceptions. No operationally useful parameters of value in hail size prediction were discovered in the database of VIL and thermodynamic sounding data. These largely anti-hypothetical findings are compared with hail forecasting and warning techniques developed in the WSR-88D era -- few in number and mostly regionalized and informal in nature -- and with more widespread and empirical forecasting assumptions involving many of the same variables.

1. Introduction

Large hail can cause both serious bodily injury and immense economic losses. This was recently exemplified by the “Mayfest” hailstorm, the most expensive thunderstorm event in U.S. history (NOAA 1995), where hail up to 114 mm (4 in.) in diameter contributed to over $2 billion in damage in Fort Worth, Texas, along with 109 hail-related injuries. Given its destructive potential, it is important to have the capability to forecast and issue timely warnings for severe hail -- particularly for those cases where hail is much larger than marginally severe criteria ( “dime size,” 19 mm or .75 in. in diameter).

Hail forecasts and warnings have remained quite challenging, particularly with respect to hail size determination, in an era of otherwise greatly increased understanding of severe local storm processes. In our operational experience, most methods of radar-based diagnosis and prediction of hail severity with radar have been informal and/or highly empirical -- sometimes merely “rules-of-thumb” with little or no quantitative substantiation such as “VIL of the day,” where VIL associated with initial severe hail reports in a region is used as a threshold for warning on subsequent storms. These factors may be partly due to a paucity of formal research specifically dealing with forecasting hail size -- relative to the supply of studies on other severe local storm phenomena such as tornadoes -- since the operational advent of the Weather Surveillance Radar - 1988 Doppler (WSR-88D). One promising tool -- an algorithm incorporating velocity data to compute upper level storm outflow -- was formulated by Witt and Nelson (1991). In that study, a relatively small sample size (21 days of severe thunderstorm cases) yielded large statistical correlations between algorithmic parameters and maximum reported hail size; but storm-top divergence has not yet been distributed, explicitly applied or verified through field testing on a national, operational basis.

Vertically Integrated Liquid water (VIL) has been used as an indicator of storm severity for over 25 years, since research leading to the original formal publication of a VIL calculation by Green and Clark (1972). As they stated:

Their work, based on 10 cm wavelength Weather Surveillance Radar - 1957 (WSR-57) data, was the forerunner of the WSR-88D VIL algorithm, whose operational characteristics are described by NOAA (1991).

Sounding data was incorporated with storm traits from the WSR-57 by Wagenmaker (1992) for successful prediction of the existence of hail; however, the accompanying attempt to distinguish between severe and sub-severe hail in 67 thunderstorms yielded inconclusive results. In the same study, Wagenmaker advocated VIL normalization with respect to measures of the convective environment in order to mitigate the susceptibility of VIL to climatological variances. A comparable, but independently developed, hypothesis was a significant motivation for a large part of our work: Adjustment of VIL data, in some physically meaningful way, for environmental conditions producing and maintaining thunderstorms, should aid in diagnosing and forecasting severity of hail.

In a similar vein, Amburn and Wolf (1997, hereafter AW97) normalized VIL in 185 severe hail events and 36 non-severe ones, using radar-estimated storm top to incorporate convective character. That effort yielded “VIL density,” a ratio of VIL to radar-estimated storm top, which generally correlated well with hail size in the northeastern Oklahoma - northwestern Arkansas region. The relative value of VIL density versus VIL alone for inferring the presence of severe hail is in doubt, however, based on some illustrations in AW97. For example, from both the Figure 2 scattergram in AW97 and their Table 1, similar probabilities of detection (POD) and false-alarm rates (FAR) for severe hail may be inferred with a minimum VIL of 38 kg m-2 as with a minimum VIL density of 3.5 g m-3. Further, no non-severe hail events were observed when VIL surpassed 43 kg m-2 -- regardless of the height of the echo top.

The reliability of VIL density in other areas of the country is uncertain as well, because of several factors: inherent imprecision of the WSR-88D storm top estimations (NOAA 1991), differences in hail climatology, and (indirectly) large regional variances of the performance of the WSR-88D Severe Weather Probability (SWP) algorithm (Jendrowski 1988), which primarily utilizes VIL fields in individual thunderstorms. Accordingly, mixed results have been obtained from local studies on VIL density outside the Tulsa, OK, region. Roeseler and Wood (1997), for example, noted similar VIL density values for a small sample of sub-severe hail events (13 mm or .5 inch) as for those that were at the severe threshold. On the other hand, Turner and Gonsowski (1997) found patterns over northwestern Kansas similar to those in AW97, but offset to slightly larger hail sizes.

On a local scale, Billet et al. (1997) developed a logistic regression model to establish a binary (yes/no) predictor of hail severity with a combination of VIL, freezing level and low-level storm inflow. They also developed a detailed multiple regression equation to predict maximum hail size, but its results were not promising. The present study is much larger in spatial scale, and attempts to determine the reliability of several parameters, including VIL and sounding variables as predictors of hail sizes across the spectrum of severe hail -- parameters heretofore used principally in empirical fashion among forecasters. Our multiple linear regression analysis to predict maximum hail diameter, though less thorough than Billet et al. (1997), corroborates their findings.

2. Data and methodology

a. Hail reports and radar data

Vertically integrated liquid (VIL) information was gathered for each of 426 severe hail events in the conterminous United States during 1996 (mapped in Figure 1). These consisted of reports collected near real-time, in the Storm Prediction Center (SPC) operational environment, as time and data availability permitted. Reports were obtained from warnings, severe weather statements or local storm reports (LSR), issued by local National Weather Service (NWS) offices. It is noted that there can be reliability deficiencies with such preliminary reports. Initial size reports may not agree in some cases with "final" Storm Data lists due to post-LSR revision. Further, and perhaps more significantly for the purpose of scientific evaluation, they are subject to several types of biases in storm-spotter observations and in NWS verification procedures -- as noted by Wagenmaker (1992), AW97 and Roeseler and Wood (1997).

Fig. 1. Map of all severe hail events used, illustrating their distribution across the conterminous U.S. [For this web version, reports are color-coded by hail size, according to the legend.]

The database was made large in geographic coverage (national) and time (coverage of all seasons through a full year) in order to incorporate a broad spectrum of hailstorm environments. This wide scope was also intended to smooth away possible regional biases in VIL analogous to those of SWP algorithms, into which VIL is input.

Generally, VIL values were obtained from the WSR-88D site nearest to the location of each severe hail report, unless:

In such cases, VIL was taken from the closest available WSR-88D that scanned the thunderstorm within a radial range of 25-230 km (the outer bound of available VIL). VIL values were obtained from the 16-level graphic VIL product in one of the following ways:

Still more refinement was necessary in order to filter unreliable data. In a few instances, no echo passed across a county containing a hail report; such events were altogether discarded. Also, approximately 30 events occurred for which the displayed VIL value could not exceed 80 kg m-2, due to a software incompatibility between a WSR-88D radar product generator (RPG) and the principal user processor (PUP) at SPC. These were also excised from the study due to unknown maximum VIL.

The VIL used for each event was typically the largest value occurring in the thunderstorm producing hail over the report location, within one volume scan before or after the time of the report. In about 70 events, report times corresponded poorly with the passage of the thunderstorm echo over the report location; in some, the difference between reported hail time and echo passage was as great as an hour. In each such event, the stated hail time was revised to match echo passage; and the VIL value used was the largest from within one volume scan before or after the closest approach of the echo centroid to the report location. In the handful of cases where there were multiple hail reports within the aforementioned three volume scan time frame, the largest reported hail size was used.

b. Sounding Analyses

Sounding data were modified and analyzed using a new version of the SHARP Workstation program originally prepared for DOS computers by Hart and Korotky (1991), and since updated for operational use at SPC on a UNIX platform. Importantly, the UNIX version of SHARP incorporates a virtual temperature correction, ideally rendering derived stability parameters such as Convective Available Potential Energy (CAPE) more physically meaningful and representative than those based upon customary computations (Doswell and Rasmussen, 1994).

Rawinsonde observations were selected based on data availability, spatial and temporal proximity to hail events and a subjective judgment of observational representativeness of the storm environment. The latter criteria was the most critical and difficult to regulate, in light of the air mass sampling problems with "proximity" soundings near tornadic supercells discussed by Brooks et al. (1994). For example, a sounding that was relatively close in space to a hail event, but was launched either behind a dryline or in cold, stable convective outflow, was rejected as unrepresentative of the air mass available to the hail-spawning thunderstorm updraft. Instead, the next nearest sounding was chosen, located farther from the event but upstream with respect to the low level inflow layer and in a convectively unstable air mass.

Unless the hail-producing thunderstorm was apparently rooted in an elevated layer of convective instability and inflow (above a relatively stable near-surface layer), each sounding was then modified for observed surface temperature and dew point in the inflow region nearest to the storm. Occasionally, this resulted in inducement of physically unreasonable conditions immediately above the modified surface. These required further modification (e.g., smoothing to eliminate of a dry-adiabatic thermal lapse rate in a saturated layer, or of a shallow autoconvectively unstable layer). In most cases where the thunderstorm was not surface-based, modification of surface temperature and dew point yielded no change in convective instability and was not necessary. The lifted parcel used was the most unstable in the lowest 300 mb, based on recommendations by Doswell and Rasmussen (1994). While there is no way to unambiguously establish that the modified soundings adequately depicted storm-scale environmental settings, we believe these methods yielded the most representative soundings possible in an operational setting. An example of these soundings and derived output is depicted in Figure 2.

Fig. 2a. Sample Skew-t plot of a modified sounding for one of the hail events in the data base. Shaded area represents CAPE.

Fig. 2b. Derived thermodynamic output from the skew-t plot shown in Fig. 2. , including CAPE, EL, MPL, freezing (FRZ) level and WBZ level.

c. Parametric comparative methods and hypotheses

Reported hail sizes were compared with VIL, several thermodynamic sounding parameters, and VIL normalized by thermodynamic variables. In the operational warning setting, severe hail is predicted using VIL, through various local adaptations of the "VIL of the day" (Paxton, 1993), the VILs associated with the first few hail reports, and/or VIL density. In particular, use of VIL density values as warning criteria appears to be spreading beyond its area of origin despite aforementioned uncertainties about interregional variability, and despite several major faults in the WSR-88D storm top estimation routine (NOAA, 1991). In empirical and informal fashion, CAPE, wet-bulb zero level and freezing level have also been used for several years to forecast hail. We tested their correlation with hail size, both on average and as predictors, independent of VIL. Finally, VIL was normalized with respect to several thermodynamic variables related to depth and strength of the convective updraft, in an effort to incorporate influences of the near-storm environment into usage of the WSR-88D for evaluation of storm severity. In these regards, several hypotheses were tested on a national scale:

These hypotheses incorporated a number of hail forecasting techniques commonly employed the National Weather Service (NWS), as well as a few of our own, all of which had not yet been quantitatively tested on a national scale.

3. Analytic results

Due to the discontinuities in sample size for specific hailstone diameters across the spectrum of reported hail events, hail size ranges were used for the purpose of evaluating associations to hail severity of both VIL alone and thermodynamically adjusted VIL. For these comparisons, the hail size spectrum was arbitrarily broken down into bins by reported hail size. The hail size ranges used here were: from dime- to just under quarter-size (.75 to .99 in. or 19 to 251 mm), quarter- to just under golf ball-size (1 to 1.74 in. or 254 to 442 mm), golf ball- to just under baseball-size (1.75 to 2.74 in. or 445 to 696 mm), and baseball-size and larger (at or above 2.75 in. or 699 mm).

[Two caveats regarding ground truth should be mentioned here. First, hail size reports in the NWS, at the time of the data collection, were most often given in severe weather reports by comparison to common objects and not specific measurements. Subsequently in the SPC logs, they were converted to sizes customarily associated with those objects. Second, even with the proliferation of storm spotters and chasers that often increases observational resolution in the path of many severe thunderstorms, sampling may still be inadequate to consistently represent hailfall character when considering the great storm-scale variability in hail parameters (Morgan and Towery, 1975).]

Fig. 3a. Graph showing mean VIL increasing with larger hail size ranges.

Fig. 3b. Graph showing mean VIL-to-EL ratio increasing with larger hail size ranges.

Fig. 3c. Graph showing mean VIL-to-MPL ratio increasing with larger hail size ranges.

Those averages, however, do not reveal a predictive association because they do not represent the ranges of magnitudes of each variable for a given hail size, as depicted in the VIL-related scatter diagrams of Figure 4.






Fig. 4. Scatter diagram of the hail reports (plotted in black dots), comparing reported hail size with: (a) associated VIL, (b) ratio of VIL to EL, (c) ratio of VIL to MPL, (d) ratio of VIL to CCD and (e) ratio of VIL to CAPE density

Similar scatter-plot patterns appeared for thermodynamic sounding variables that were not combined with VIL (Figure 5). They illustrate a much wider variation in all variables for hail up to golf ball-size than for the largest and most potentially destructive hailstones; however, some clearer associations are apparent on the low end of most data set variables. For example, no hail size larger than two inches (508 mm) was accompanied by a VIL under 50 kg m-2, VIL/EL less than 40*10-4 kg m-3, VIL/MPL below 35*10-4 kg m-3, CCD less than 104 m, or freezing level below 2800 m (Figures 4 and 5). No baseball-size (2.75 in. or 762 mm) hailstones were reported in association with modified CAPE less than 1300 m2 s-2. Such associations indicate a decreased likelihood of the largest, most destructive hail with low-end values of those parameters; but provide no insight whatsoever about associations of high-end VIL values with hail size. Moreover, caution should be used in translating these low-end VIL and thermodynamic-variable findings to operational warning use, because they do not cover the unknown risk of any anomalous events that were not captured in our data set.






Fig. 5. Scatter diagram of the hail reports (plotted in black dots), comparing reported hail size with associated parameters derived from modified soundings: (a) CAPE, (b) CCD, (c) CAPE/CCD or CAPE density, (d) wet-bulb zero level and (e) freezing level.

In general, little can be inferred about the upper bounds of any of the VIL-based or wholly thermodynamic variables when compared to hail size, in disagreement with our hypotheses. For example, wet bulb zero and freezing levels associated with very large (baseball-size and larger) hail were clustered close to the middle of the spectra of those associated with marginally severe (dime- to quarter-size) hail (Figures 5d and 5e). Particularly notable was the close similarity between VIL-versus-hail size comparisons and those using normalized VIL, suggesting little value nationally to incorporating environmental thermodynamics, as presently available, into VIL examination. This indication contradicts not only our hypothesis about the importance of VIL normalization; but it also counters smaller-scale results, such as those associated with VIL density, that indicated its value as a warning tool on a regional basis.

Statistical measures were then applied to the scatter-plot data. Ambiguous results appeared when any of the parameters in our set (VIL, VIL/EL, VIL/MPL, CAPE, CCD, wet bulb zero level, freezing level, CAPE density, and so forth) were tested as possible hail-size predictors. For sounding-based parameters, the results should apply best to storm-scale predictions (on the order of 103 km2). Using a least squares best-fit, none of them was associated with a correlation coefficient higher than 0.17. Considering Figure 4a, a straight line is a poor fit to the wide range of VIL values associated with the hail sizes less than two in. (508 mm), causing the poor correlation coefficient. To further illustrate the weak predictive relationships of our VIL and sounding data to hail size, a multiple regression analysis yielded mean errors of 0.7 in. (178 mm) among the full suite (6) of variables. Again, these results corroborate the more detailed and regional-scale statistical analyses conducted by Billet et al. (1997). Contrary to our hypotheses, these results suggest that, at least on a super-regional scale, many of the parameters most commonly used to predict hail severity are practically useless. .

4. Conclusions and recommendations

A crucial finding in our work was that, on a nationwide basis, commonly used hail predictors showed little or no skill in predicting hail size. In order to effectively develop techniques for such forecasts, it is clear that regional variances in hail data quality and assimilation, as well as in the utility of VIL, must be examined thoroughly for methods of bias minimization. Although it performed similarly to other predictors in our data set, the poor correlation coefficients and large predictive errors indicate that VIL should not be used alone to warn for hail size. In the same vein, and because of the characteristic coarseness of radar-derived storm top estimates, great caution should be exercised with similar use of VIL density outside the southern plains, as well as with other VIL-based tools such as SWP or Cell Trends.

Furthermore, none of the following should be used specifically as a hail severity forecasting tool: VIL, CAPE, MPL, EL, CCD, wet bulb zero level, freezing level, or various combinations thereof -- until local or regional methods of formally demonstrated statistical significance are devised which are also shown to be unambiguously adaptable to different parts of the country. Part of such an effort must involve interregional standardization of methods for real-time gathering and subsequent verification of severe hail events, in a way that data from office to office is consistently, accurately and precisely collected. Once formulated and successfully tested, any such flexible hail forecasting technique -- of national utility and common physical basis -- can then be applied not only at individual NWS offices, but for SPC national convective outlooks, mesoscale forecast discussions and severe weather watches.

Based on inconclusive associations between hail severity and purely thermodynamic sounding predictors, it is clear that future research into techniques development for hail forecasting should necessarily incorporate kinematic influences to some degree, whether national or regional in scope. This could include parameters such as bulk Richardson number (BRN), BRN shear, storm-relative (SR) helicity and various layer averages of SR flow (Thompson, 1997). These parameters are each used operationally to forecast potential for supercells in general, and in the case of SR flow, tornadic supercells in particular. Because kinematically enhanced vertical pressure gradient forces lead to vertical storm-scale accelerations beyond those associated with buoyancy alone (as commonly denoted by CAPE), they may be useful on at least a regional basis in forecasting size of hail associated with rotating thunderstorms -- either alone or in combination with VIL and thermodynamic variables.

There may also be some value in testing for associations between downdraft CAPE (DCAPE, as conceived by Emanuel 1994 and computed by Gilmore and Wicker 1997 for determining supercell character) and hail size -- a capability which available software did not allow in our study. Our results with updraft CAPE indicate that DCAPE may also work poorly as a predictor on a national basis; but its regional applicability to hail forecasting is yet unknown.

These and other yet-untested or undiscovered parameters should be tested for operational reliability in forecasting large hail events, with the goal of greater advance notification of the potential for destructive hail -- further mitigating its economic and personal-safety peril. To be optimally useful, any hail size forecasting scheme must incorporate environmental thermodynamic and kinematic information accessible before an event. Future research in this area should involve not only balloon soundings, but profilers, velocity-azimuth display (VAD) wind data, satellite sounding data and surface observations. Any methods exhibiting skill in those areas may be applied to situationally adjusted numerical model forecast soundings, in order to formulate large hail forecasting techniques applicable to temporal domains as large and far in advance as SPC convective outlooks.

5. Acknowledgments

We appreciate the efforts of all those who offered insightful comments, reviews and ideas for this research, including Steve Corfidi, Jim Henderson, Dave Imy, Bob Johns, Mike Vescio and Steve Weiss. Thanks also go to others within NSSL, SPC, NCEP and several other NWS offices, as well as the formal reviewers, who proofread various revisions of this paper and provided useful suggestions.


Amburn, S., and P. Wolf, 1997: VIL density as a hail indicator. Wea. Forecasting, 12, 473-478.

Billet, J., M. DeLisi, and B.G. Smith, 1997: Use of regression techniques to predict hail size and the probability of large hail. Wea. Forecasting, 12, 154-164.

Brooks, H.E., C.A. Doswell III, and J. Cooper, 1994: On the environments of tornadic and nontornadic mesocyclones. Wea. Forecasting, 9, 606-618.

Doswell, C.A., III, and E.N. Rasmussen, 1994: The effect of neglecting the virtual temperature correction on CAPE calculation. Wea. Forecasting, 9, 625-629.

Emanuel, K.A., 1994: Atmospheric Convection. Oxford University Press, New York, NY, 883 pp.

Gilmore, M.S., and L.J. Wicker, 1997: The influence of midtropospheric dryness on supercell morphology and evolution. Submitted to Mon. Wea. Rev.

Greene, D.R., and R.A. Clark, 1972: Vertically integrated liquid water -- a new analysis tool. Mon. Wea. Rev., 100, 548-552.

Hart, J.A., and W.D. Korotky, 1991: The SHARP workstation - v1.50. A skew-t/hodograph analysis and research program for the IBM and compatible PC. User’s manual. NOAA/NWS Forecast Office, Charleston, WV, 62 pp. [Available from NOAA/NWS Forecast Office, Charleston, WV 25309.]

Jendrowski, P.A., 1988: Regionalization of the NEXRAD severe weather probability algorithm. Preprints, 18th Conf. on Severe Local Storms, Baltimore, MD, Amer. Meteor. Soc., 205-208.

Kitzmiller, D.H., and J.P. Briedenbach, 1993: Probabilistic nowcasts of large hail based on volumetric reflectivity and storm environmental characteristics. Preprints, 26th Conf. On Radar Meteorology, Norman, OK, Amer. Meteor. Soc., 157-159.

Morgan, G.M., Jr., and N.G. Towery, 1975: Small-scale variability of hail and its significance for hail prevention experiments. J. Appl. Meteor., 14, 763-770.

NOAA (National Oceanic and Atmospheric Administration), 1991: Federal Meteorological Handbook No. 11: Doppler Radar Observations. Part C: WSR-88D Products and Algorithms. Office of the Federal Coordinator for Meteorological Observations and Supporting Research, Rockville, MD. 2-98 through 3-36. [Available from Office of the Federal Coordinator for Meteorological Observations and Supporting Research, Rockville, MD 20852.]

_______, 1995a: Storm Data, 37, no. 5. National Climatic Data Center, Asheville, NC, 6-7. [Available from National Climatic Data Center, Asheville, NC 28801.]

Paxton, C.H., and J.M. Shepherd, 1993: Radar diagnostic parameters as indicators of severe weather in central Florida. NOAA Technical Memorandum NWS SR-149, 12 pp.

Roeseler, C.A., and L. Wood, 1997: VIL density and associated hail size along the northwest Gulf coast. Preprints, 28th Conf. On Radar Meteorology, Austin, TX, Amer. Meteor. Soc., 434-435.

Thompson, R.L., 1997: Eta model storm-relative winds associated with tornadic and non-tornadic supercells. Accepted for Wea. Forecasting.

Turner, R.J., and D.M. Gonsowski, 1997: A review of VIL density performance at NWSO Goodland, Kansas. Preprints, 28th Conf. On Radar Meteorology, Austin, TX, Amer. Meteor. Soc., 370-371.

Wagenmaker, R.B., 1992: Operational detection of hail by radar using heights of VIP-5 reflectivity echoes. Natl. Wea. Dig., 17, no. 2, 2-15.

Witt, A., 1990: A hail core aloft detection algorithm. Preprints, 16th Conf. on Severe Local Storms, Kananaskis Park, Alta., Amer. Meteor. Soc., 232-235.

_______, and S.P. Nelson, 1991: The use of single-Doppler radar for estimating maximum hailstone size. J. Appl. Meteor., 30, 425-431.