Matlab直方图绘制函数histogram

今天，带来Matlab中绘制直方图的函数histogram，直方图将将数据分割，然后每一块以一个矩形显示并绘图。这种直方图，看起来非常直观，也是数据统计中常用的绘图方法。

本文，主要介绍histogram函数的常见用法、语法说明、向量直方图、指定分割区块（bin）数量、修改直方图的bin数量、指定直方图bin的边界、直方图数据分类绘制、直方图的归一化、多个直方图的绘制、直方图属性的调整、概率分布的确定、直方图的保存与加载等方面的介绍。

下面我们将开始非常详细的 Matlab histogram 函数语法介绍，实例引用，结果展示。首先，我们给出 Matlab 中关于 histogram 函数的帮助文本如下：

>> help histogram
 histogram  Plots a histogram.
    histogram(X) plots a histogram of X. histogram determines the bin edges 
    using an automatic binning algorithm that returns uniform bins of a width 
    that is chosen to cover the range of values in X and reveal the shape 
    of the underlying distribution. 
 
    histogram(X,M), where M is a scalar, uses M bins.
 
    histogram(X,EDGES), where EDGES is a vector, specifies the edges of 
    the bins.
 
    The value X(i) is in the kth bin if EDGES(k) <= X(i) < EDGES(k+1). The 
    last bin will also include the right edge such that it will contain X(i)
    if EDGES(end-1) <= X(i) <= EDGES(end).
 
    histogram(...,'BinWidth',BW) uses bins of width BW. To prevent from 
    accidentally creating too many bins, a limit of 65536 bins can be 
    created when specifying 'BinWidth'. If BW is too small such that more 
    than 65536 bins are needed, histogram uses wider bins instead.
 
    histogram(...,'BinLimits',[BMIN,BMAX]) plots a histogram with only 
    elements in X between BMIN and BMAX inclusive, X(X>=BMIN & X<=BMAX).
 
    histogram(...,'Normalization',NM) specifies the normalization scheme 
    of the histogram values. The normalization scheme affects the scaling 
    of the histogram along the vertical axis (or horizontal axis if 
    Orientation is 'horizontal'). NM can be:
                   'count'   The height of each bar is the number of 
                             observations in each bin, and the sum of the
                             bar heights is NUMEL(X).
             'probability'   The height of each bar is the relative 
                             number of observations (number of observations
                             in bin / total number of observations), and
                             the sum of the bar heights is 1.
            'countdensity'   The height of each bar is the number of 
                             observations in each bin / width of bin. The 
                             area (height * width) of each bar is the number
                             of observations in the bin, and the sum of
                             the bar areas is NUMEL(X).
                     'pdf'   Probability density function estimate. The height 
                             of each bar is, (number of observations in bin)
                             / (total number of observations * width of bin).
                             The area of each bar is the relative number of
                             observations, and the sum of the bar areas is 1.
                'cumcount'   The height of each bar is the cumulative 
                             number of observations in each bin and all
                             previous bins. The height of the last bar
                             is NUMEL(X).
                     'cdf'   Cumulative density function estimate. The height 
                             of each bar is the cumulative relative number
                             of observations in each bin and all previous bins.
                             The height of the last bar is 1.
 
    histogram(...,'DisplayStyle',STYLE) specifies the display style of the 
    histogram. STYLE can be:
                     'bar'   Display a histogram bar plot. This is the default.
                  'stairs'   Display a stairstep plot, which shows the 
                             outlines of the histogram without filling the 
                             interior. 
 
    histogram(...,'BinMethod',BM), uses the specified automatic binning 
    algorithm to determine the number and width of the bins. BM can be:
                    'auto'   The default 'auto' algorithm chooses a bin 
                             width to cover the data range and reveal the 
                             shape of the underlying distribution.
                   'scott'   Scott's rule is optimal if the data is close  
                             to being normally distributed, but is also 
                             appropriate for most other distributions. It 
                             uses a bin width of 
                             3.5*STD(X(:))*NUMEL(X)^(-1/3).
                      'fd'   The Freedman-Diaconis rule is less sensitive  
                             to outliers in the data, and may be more 
                             suitable for data with heavy-tailed 
                             distributions. It uses a bin width of 
                             2*IQR(X(:))*NUMEL(X)^(-1/3), where IQR is the 
                             interquartile range.
                'integers'   The integer rule is useful with integer data, 
                             as it creates a bin for each integer. It uses 
                             a bin width of 1 and places bin edges halfway 
                             between integers. To prevent from accidentally 
                             creating too many bins, a limit of 65536 bins 
                             can be created with this rule. If the data 
                             range is greater than 65536, then wider bins
                             are used instead.
                 'sturges'   Sturges' rule is a simple rule that is popular
                             due to its simplicity. It chooses the number 
                             of bins to be CEIL(1 + LOG2(NUMEL(X))).
                    'sqrt'   The Square Root rule is another simple rule 
                             widely used in other software packages. It 
                             chooses the number of bins to be
                             CEIL(SQRT(NUMEL(X))).
 
    histogram(...,NAME,VALUE) set the property NAME to VALUE. 
      
    histogram(AX,...) plots into AX instead of the current axes.
        
    H = histogram(...) also returns a histogram object. Use this to inspect 
    and adjust the properties of the histogram.
 
    Class support for inputs X, EDGES:
       float: double, single
       integers: uint8, int8, uint16, int16, uint32, int32, uint64, int64
       logical

常见用法

histogram(X)
histogram(X,nbins)
histogram(X,edges)
histogram('BinEdges',edges,'BinCounts',counts)
histogram(C)
histogram(C,Categories)
histogram('Categories',Categories,'BinCounts',counts)
histogram(___,Name,Value)
histogram(ax,___)
h = histogram(___)

语法说明

histogram(X) 基于 X 创建直方图。histogram 函数使用自动 bin 划分算法，然后返回均匀宽度的 bin，这些 bin 可涵盖 X 中的元素范围并显示分布的基本形状。histogram 将 bin 显示为矩形，这样每个矩形的高度就表示 bin 中的元素数量。

histogram(X,nbins) 使用标量 nbins 指定的 bin 数量。

histogram(X,edges) 将 X 划分到由向量 edges 来指定 bin 边界的 bin 内。每个 bin 都包含左边界，但不包含右边界，除了同时包含两个边界的最后一个 bin 外。

histogram(‘BinEdges’,edges,’BinCounts’,counts) 手动指定 bin 边界和关联的 bin 计数。histogram 绘制指定的 bin 计数，而不执行任何数据的 bin 划分。

histogram(C)（其中 C 为分类数组）通过为 C 中的每个类别绘制一个条形来绘制直方图。

histogram(C,Categories) 仅绘制 Categories 指定的类别的子集。

histogram(‘Categories’,Categories,’BinCounts’,counts) 手动指定类别和关联的 bin 计数。histogram 绘制指定的 bin 计数，而不执行任何数据的 bin 划分。

histogram(___,Name,Value) 使用前面的任何语法指定具有一个或多个 Name,Value 对组参数的其他选项。例如，可以指定 ‘BinWidth’ 和一个标量以调整 bin 的宽度，或指定 ‘Normalization’ 和一个有效选项（’count’、’probability’、’countdensity’、’pdf’、’cumcount’ 或 ‘cdf’）以使用不同类型的归一化。

histogram(ax,___) 将图形绘制到 ax 指定的坐标区中，而不是当前坐标区 (gca) 中。选项 ax 可以位于前面的语法中的任何输入参数组合之前。

h = histogram(___) 返回 Histogram 对象。使用此语法可检查并调整直方图的属性。

向量直方图

生成 10,000 个随机数并创建直方图。histogram 函数自动选择合适的 bin 数量，以便涵盖 x 中的值范围并显示基本分布的形状。

x = randn(10000,1);
h = histogram(x)

输出结果：

h = 
  Histogram (具有属性):
             Data: [10000x1 double]
           Values: [1x41 double]
          NumBins: 41
         BinEdges: [1x42 double]
         BinWidth: 0.2000
        BinLimits: [-4 4.2000]
    Normalization: 'count'
        FaceColor: 'auto'
        EdgeColor: [0 0 0]

指定 histogram 函数的输出参数时，它返回一个二元直方图对象。可以使用该对象检查直方图的属性，例如 bin 数量或宽度。

计算直方图的 bin 数量。

nbins = h.NumBins

输出结果：

nbins = 41

指定分割区块（bin）数量

对分类为 25 个等距 bin 的 1,000 个随机数绘制直方图。

x = randn(1000,1);
nbins = 25;
h = histogram(x,nbins)

输出结果为：

h = 
  Histogram (具有属性):
             Data: [1000x1 double]
           Values: [1 1 2 11 17 26 37 61 72 101 102 106 92 106 74 65 48 31 23 11 7 1 3 0 2]
          NumBins: 25
         BinEdges: [1x26 double]
         BinWidth: 0.2700
        BinLimits: [-3.2000 3.5500]
    Normalization: 'count'
        FaceColor: 'auto'
        EdgeColor: [0 0 0]

求 bin 计数。

counts = h.Values

输出结果为：

counts =
     1     1     2    11    17    26    37    61    72   101   102   106    92   106    74    65    48    31    23    11     7     1     3     0     2

修改直方图的bin数量

生成 1,000 个随机数并创建直方图。

X = randn(1000,1);
h = histogram(X)

输出结果为：

h = 
  Histogram (具有属性):
             Data: [1000x1 double]
           Values: [3 4 8 19 29 63 63 85 104 125 130 95 83 72 41 33 25 11 4 1 1 0 1]
          NumBins: 23
         BinEdges: [1x24 double]
         BinWidth: 0.3000
        BinLimits: [-3.0000 3.9000]
    Normalization: 'count'
        FaceColor: 'auto'
        EdgeColor: [0 0 0]

使用 morebins 函数粗略调整 bin 数量。

Nbins = morebins(h);
Nbins = morebins(h)

输出结果为：

Nbins =
    29

通过显式设置 bin 数按精细颗粒级别调整 bin。

h.NumBins = 31;

指定直方图bin的边界

生成 1,000 个随机数并创建直方图。将 bin 边界指定为向量，使宽 bin 在直方图的两边，以捕获不满足 x<2 的离群值。第一个向量元素是第一个 bin 的左边界，而最后一个向量元素是最后一个 bin 的右边界。

x = randn(1000,1);
edges = [-10 -2:0.25:2 10];
h = histogram(x,edges);

将 Normalization 属性指定为 ‘countdensity’ 以使包含离群值的 bin 扁平化。现在，每个 bin 的区域（而不是高度）表示该 bin 的观测值频率。

h.Normalization = 'countdensity';

直方图数据分类绘制

创建一个表示投票的分类向量。该向量中的类别是 ‘yes’、’no’ 或 ‘undecided’。

A = [0 0 1 1 1 0 0 0 0 NaN NaN 1 0 0 0 1 0 1 0 1 0 0 0 1 1 1 1];
C = categorical(A,[1 0 NaN],{'yes','no','undecided'})

输出结果为：

C = 
  1 至 16 列
     no      no      yes      yes      yes      no      no      no      no      undecided      undecided      yes      no      no      no      yes 
  17 至 27 列
     no      yes      no      yes      no      no      no      yes      yes      yes      yes

使用相对条形宽度 0.5 绘制投票的分类直方图。

h = histogram(C,'BarWidth',0.5)

输出结果为：

h = 
  Histogram (具有属性):
             Data: [1x27 categorical]
           Values: [11 14 2]
       Categories: {'yes'  'no'  'undecided'}
    Normalization: 'count'
     DisplayStyle: 'bar'
        FaceColor: 'auto'
        EdgeColor: [0 0 0]

直方图的归一化

生成 1,000 个随机数并使用 ‘probability’ 归一化创建直方图。

x = randn(1000,1);
h = histogram(x,'Normalization','probability')

输出结果为：

h = 
  Histogram (具有属性):
             Data: [1000x1 double]
           Values: [1x22 double]
          NumBins: 22
         BinEdges: [1x23 double]
         BinWidth: 0.3000
        BinLimits: [-3.0000 3.6000]
    Normalization: 'probability'
        FaceColor: 'auto'
        EdgeColor: [0 0 0]

计算条形高度的总和。通过该归一化，每个条形的高度等于在该 bin 间隔内选择观测值的概率，并且所有条形的高度总和为 1。

S = sum(h.Values)

输出结果为：

S =
     1

多个直方图的绘制

生成两个随机数向量并在同一图窗中针对每个向量绘制对应的一个直方图。

x = randn(2000,1);
y = 1 + randn(5000,1);
h1 = histogram(x);
hold on
h2 = histogram(y);

由于直方图的示例大小和 bin 宽度不同，很难将它们进行比较。对这些直方图进行归一化，这样所有的条形高度相加的结果为 1 并使用统一的 bin 宽度。

h1.Normalization = 'probability';
h1.BinWidth = 0.25;
h2.Normalization = 'probability';
h2.BinWidth = 0.25;

直方图属性的调整

生成 1,000 个随机数并创建直方图。返回直方图对象以调整该直方图的属性，无需重新创建整个绘图。

x = randn(1000,1);
h = histogram(x)

输出结果为：

h = 
  Histogram (具有属性):
             Data: [1000x1 double]
           Values: [1 3 7 11 23 33 55 87 76 118 125 112 94 72 63 42 34 14 20 5 3 2]
          NumBins: 22
         BinEdges: [1x23 double]
         BinWidth: 0.3000
        BinLimits: [-3.3000 3.3000]
    Normalization: 'count'
        FaceColor: 'auto'
        EdgeColor: [0 0 0]

准确指定要使用的 bin 数量。

h.NumBins = 15;

通过向量指定 bin 边界。向量中的第一个值是第一个 bin 的左边界。最后一个值是最后一个 bin 的右边界。

h.BinEdges = [-3:3];

更改直方图条形的颜色。

h.FaceColor = [0 0.5 0.5];
h.EdgeColor = 'r';

概率分布的确定

生成 5,000 个均值为 5、标准差为 2 的正态分布随机数。在 Normalization 设为 ‘pdf’ 的情况下绘制直方图可生成概率密度函数的估计值。

x = 2*randn(5000,1) + 5;
histogram(x,'Normalization','pdf')

在本示例中，已知正态分布数据的基本分布。但是，通过将它与已知的概率密度函数进行对比，可以使用 ‘pdf’ 直方图确定该数据的基础概率分布。

均值为 μ、标准差为 σ 以及方差为 σ2 的正态分布的概率密度函数是：（我这里不能输入公式，凑合着看吧。）

f(x,μ,σ)=1/[σ*sqrt(2)π]*exp[−(x−μ)²/2σ²].

对于均值为 5、标准差为 2 的正态分布，叠加一个概率密度函数图。

hold on
y = -5:0.1:15;
mu = 5;
sigma = 2;
f = exp(-(y-mu).^2./(2*sigma^2))./(sigma*sqrt(2*pi));
plot(y,f,'LineWidth',1.5)

直方图的保存与加载

使用 savefig 函数保存直方图。

y = histogram(randn(10));
savefig('histogram.fig');

clear all
close all

使用 openfig 重新将直方图加载到 MATLAB。openfig 也返回图窗 h 的句柄。

h = openfig('histogram.fig');

使用 findobj 函数从图窗句柄中查找正确的对象句柄。这样，您可以继续处理用于生成图窗的原始直方图对象。

y = findobj(h, 'type', 'histogram')

输出结果为：

y = 
  Histogram (具有属性):
             Data: [10x10 double]
           Values: [1 3 11 17 14 22 14 8 10]
          NumBins: 9
         BinEdges: [-2.5000 -2 -1.5000 -1 -0.5000 0 0.5000 1 1.5000 2]
         BinWidth: 0.5000
        BinLimits: [-2.5000 2]
    Normalization: 'count'
        FaceColor: 'auto'
        EdgeColor: [0 0 0]