.-
help for ^violin^                                               (STB-46: gr33)
.-

Violin plots
------------

    ^violin^ varlist [weight] [^if^ exp] [^in^ range]
        [^,^ {^bi^weight|^cos^ine|^ep^an|^gau^ss|^par^zen|^rec^tangle|^tri^angle}
         ^n(^#^) w^idth^(^#^) by(^byvar^) tru^ncat^(^#,#|*^) ro^und^(^#^)^ 
         graph_options ]

^fweights^ and ^aweights^ are allowed; see ^help^ @weights@.


Description
-----------

^violin^ produces violin plots, a graphical box plot--kernel density synergism.
The violin plot combines the basic summary statistics of a box plot with the 
visual information provided by a local density estimator. The goal is to 
reveal the distributional structure in a variable.  Much like a traditional 
box plot, the violin plot displays the median as a short horizontal line, the 
first-to-third interquartile range as a narrow shaded box, and the lower-to-
upper adjacent value range as a vertical line, but it does not plot outside 
values.  Instead, it "boxes" the data with mirrored density curves and labels 
the y-axis at the minimum, median and maximum observed data values.

^violin^ also lists basic descriptive statistics about the data (i.e., the 
lower and upper adjacent values, the 25th and 75th centiles, the minimum, 
median and maximum of the data, and the sample size) and it provides 
information about the density estimation (i.e., the kernel method used, the 
number of points of estimation, and the resulting scale and width factors).  
When ^by()^ is specified, descriptive statistics are displayed for the combined
group only.  When multiple variables are included in varlist, statistics are 
displayed for the last variable only.  

^violin^ discards observations on an casewise basis as a function of 1) missing
data and 2) the ^if^ (or ^in^) specification (i.e, it ignores the entire 
observation).  This behavior may lead to unexpected results when multiple 
variables are in the varlist.
  
Note: ^violin^ calls ^centile^ to compute the needed centiles but ^centile^ does
      not respond to a ^[weight]^ specification.  This conflicts with the 
      ^kdensity^ code which responds to that specification.  The implications of
      this conflict have not been explored, but ^violin^ currently allows the the
      ^[weight]^ specification to be passed through to ^kdensity^.

Note: ^violin^ uses a low-level ^gph^ command which is not supported in Stata's 
      release 2 ^.gph^ format.  As a result neither ^Stage^ nor the ^gphdot^ or 
      ^gphpen^ DOS-based graphics output programs can process a saved violin-plot
      graphics file.  This limitation does not affect screen display or output
      using the ^Print Graph^ option of Stata's ^File^ menu. 


Options
-------

^biweight^, ^cosine^, ..., ^triangle^ specify the kernel.  By default, ^epan^, the
    Epanechnikov kernel, is used.

^n(^#^)^ specifies the number of points at which density estimates will be
    evaluated.  The default is 50.

^width(^#^)^ specifies the halfwidth of the kernel, the width of the density
    window around each point.  If ^width()^ is not specified, then the "optimal"
    width is used; see ^[R] kdensity^.  For multimodal and highly skewed 
    densities, the "optimal" width is usually too wide and oversmooths the
    density.

^by(^byvar^)^ produces separate plots for the groups of observations defined by 
    byvar and displays them in a single graph having common vertical scale. 
    ^by()^ cannot be specified when there is more than one variable in the 
    varlist.

^truncat(^#^,^#|^*)^ limits the range of the density trace, either to a range 
    specified as ^(^#^,^#^)^, or to the observed data limits, specified as ^(*)^.
    Regardless of the actual ^(^#^,^#^)^ specification, the maximum range truncation
    honored is the observed data limits.  The precise truncation points will 
    be the most extreme points within the specified range where the density is
    calculated (the points of density calculation depend on ^n()^, ^width()^ 
    and the observed data).

^round(^#^)^ rounds the y-axis numeric labels to the value specified.  As a result,
    the labels and their corresponding tic marks may not be placed at the true
    minimum, median, or maximum values, rather they will be at the rounded
    values.  ^round()^ has no effect if ^ylabel^ is specified without arguments, 
    but is operative if ^ylabel^ is not specified or is specified with arguments.
    The ^round()^ option follows the rules of Stata's ^round(^x^,^y^)^ function, with 
    # being the y argument and each label value being the x argument; 
    see ^[U] 20.3.5 Special functions^.  

graph_options are any of the options allowed by ^graph, twoway^ except ^b2title()^
    (which is ignored); see ^help^ @graph@.  Some options are preset and, although
    changeable, usually should not be modified.  These include ^symbol(i)^ and 
    ^connect(l)^ for specifying the plotting symbol and point connection method
    for the density curve.  In addition, ^ylabel()^ is preset to label only the
    minimum, median and maximum points.  ^t1title(Violin Plot)^ is preset but can
    be changed--except when ^by()^ is specified; in this instance ^t1title^ is used
    for the variable name or label.  When changeable, use of ^t1title(.)^ will
    result in a blank title.  Other preset options, such as ^pen(2)^ for the
    plot pen color, are intended to be freely changed to suit user preference.
    A few options, such as the left and right titles, are set (or default to) 
    blank.  If specified, they appear beside each plot in a multi-variable
    graph.  Lastly, the ^saving()^ option differs slightly from ^graph^'s in
    that the filename extension is always ^.gph^ and must not be specified.


Saved values
------------

    S_1   name of kernel used for density trace
    S_2   number of points of density estimation
    S_3   band width for density estimation
    S_4   scale factor of density plot
    S_5   minimum
    S_6   lower adjacent value
    S_7   first quartile
    S_8   median
    S_9   third quartile
    S_10  upper adjacent value
    S_11  maximum
    S_12  n

When ^by()^ is specified: S_3 and S_4 contain the averages of the band width and
scale factors used in the subgroup density estimations; S_5, S_7, S_8, S_9, 
S_11 and S_12 are statistics for the combined group; and S_6 and S_10 are set 
missing.

When multiple variables are specified, the saved values contain results for
the last variable in the varlist.
  

Examples
--------

	. ^violin length, t1(Auto data) l1(length of car)^

	. ^violin length weight, n(100) w(20)^

	. ^violin weight, by(foreign) parzen^


Author
------

      Thomas J. Steichen
      RJRT
      steicht@@rjrt.com


Reference
---------

Hintze, J. L. and R. D. Nelson (1998).  "Violin plots: a box plot-density trace
     synergism."  The American Statistician, 52(2):181-4.


Also see
--------

    STB:  gr33 (STB-46)
 Manual:  ^[R] kdensity^, ^[R] graph box^, ^[R] centile^ 
          ^[U] 20.3.5 Special functions^
On-line:  help for @kdensity@, @graph@, @centile@, @functions@