今天,带来Matlab中绘制直方图的函数histogram,直方图将将数据分割,然后每一块以一个矩形显示并绘图。这种直方图,看起来非常直观,也是数据统计中常用的绘图方法。
本文,主要介绍histogram函数的常见用法、语法说明、向量直方图、指定分割区块(bin)数量、修改直方图的bin数量、指定直方图bin的边界、直方图数据分类绘制、直方图的归一化、多个直方图的绘制、直方图属性的调整、概率分布的确定、直方图的保存与加载等方面的介绍。
下面我们将开始非常详细的 Matlab histogram 函数语法介绍,实例引用,结果展示。首先,我们给出 Matlab 中关于 histogram 函数的帮助文本如下:
>> help histogram
histogram Plots a histogram.
histogram(X) plots a histogram of X. histogram determines the bin edges
using an automatic binning algorithm that returns uniform bins of a width
that is chosen to cover the range of values in X and reveal the shape
of the underlying distribution.
histogram(X,M), where M is a scalar, uses M bins.
histogram(X,EDGES), where EDGES is a vector, specifies the edges of
the bins.
The value X(i) is in the kth bin if EDGES(k) <= X(i) < EDGES(k+1). The
last bin will also include the right edge such that it will contain X(i)
if EDGES(end-1) <= X(i) <= EDGES(end).
histogram(...,'BinWidth',BW) uses bins of width BW. To prevent from
accidentally creating too many bins, a limit of 65536 bins can be
created when specifying 'BinWidth'. If BW is too small such that more
than 65536 bins are needed, histogram uses wider bins instead.
histogram(...,'BinLimits',[BMIN,BMAX]) plots a histogram with only
elements in X between BMIN and BMAX inclusive, X(X>=BMIN & X<=BMAX).
histogram(...,'Normalization',NM) specifies the normalization scheme
of the histogram values. The normalization scheme affects the scaling
of the histogram along the vertical axis (or horizontal axis if
Orientation is 'horizontal'). NM can be:
'count' The height of each bar is the number of
observations in each bin, and the sum of the
bar heights is NUMEL(X).
'probability' The height of each bar is the relative
number of observations (number of observations
in bin / total number of observations), and
the sum of the bar heights is 1.
'countdensity' The height of each bar is the number of
observations in each bin / width of bin. The
area (height * width) of each bar is the number
of observations in the bin, and the sum of
the bar areas is NUMEL(X).
'pdf' Probability density function estimate. The height
of each bar is, (number of observations in bin)
/ (total number of observations * width of bin).
The area of each bar is the relative number of
observations, and the sum of the bar areas is 1.
'cumcount' The height of each bar is the cumulative
number of observations in each bin and all
previous bins. The height of the last bar
is NUMEL(X).
'cdf' Cumulative density function estimate. The height
of each bar is the cumulative relative number
of observations in each bin and all previous bins.
The height of the last bar is 1.
histogram(...,'DisplayStyle',STYLE) specifies the display style of the
histogram. STYLE can be:
'bar' Display a histogram bar plot. This is the default.
'stairs' Display a stairstep plot, which shows the
outlines of the histogram without filling the
interior.
histogram(...,'BinMethod',BM), uses the specified automatic binning
algorithm to determine the number and width of the bins. BM can be:
'auto' The default 'auto' algorithm chooses a bin
width to cover the data range and reveal the
shape of the underlying distribution.
'scott' Scott's rule is optimal if the data is close
to being normally distributed, but is also
appropriate for most other distributions. It
uses a bin width of
3.5*STD(X(:))*NUMEL(X)^(-1/3).
'fd' The Freedman-Diaconis rule is less sensitive
to outliers in the data, and may be more
suitable for data with heavy-tailed
distributions. It uses a bin width of
2*IQR(X(:))*NUMEL(X)^(-1/3), where IQR is the
interquartile range.
'integers' The integer rule is useful with integer data,
as it creates a bin for each integer. It uses
a bin width of 1 and places bin edges halfway
between integers. To prevent from accidentally
creating too many bins, a limit of 65536 bins
can be created with this rule. If the data
range is greater than 65536, then wider bins
are used instead.
'sturges' Sturges' rule is a simple rule that is popular
due to its simplicity. It chooses the number
of bins to be CEIL(1 + LOG2(NUMEL(X))).
'sqrt' The Square Root rule is another simple rule
widely used in other software packages. It
chooses the number of bins to be
CEIL(SQRT(NUMEL(X))).
histogram(...,NAME,VALUE) set the property NAME to VALUE.
histogram(AX,...) plots into AX instead of the current axes.
H = histogram(...) also returns a histogram object. Use this to inspect
and adjust the properties of the histogram.
Class support for inputs X, EDGES:
float: double, single
integers: uint8, int8, uint16, int16, uint32, int32, uint64, int64
logical
常见用法
histogram(X)
histogram(X,nbins)
histogram(X,edges)
histogram('BinEdges',edges,'BinCounts',counts)
histogram(C)
histogram(C,Categories)
histogram('Categories',Categories,'BinCounts',counts)
histogram(___,Name,Value)
histogram(ax,___)
h = histogram(___)
语法说明
histogram(X) 基于 X 创建直方图。histogram 函数使用自动 bin 划分算法,然后返回均匀宽度的 bin,这些 bin 可涵盖 X 中的元素范围并显示分布的基本形状。histogram 将 bin 显示为矩形,这样每个矩形的高度就表示 bin 中的元素数量。
histogram(X,nbins) 使用标量 nbins 指定的 bin 数量。
histogram(X,edges) 将 X 划分到由向量 edges 来指定 bin 边界的 bin 内。每个 bin 都包含左边界,但不包含右边界,除了同时包含两个边界的最后一个 bin 外。
histogram(‘BinEdges’,edges,’BinCounts’,counts) 手动指定 bin 边界和关联的 bin 计数。histogram 绘制指定的 bin 计数,而不执行任何数据的 bin 划分。
histogram(C)(其中 C 为分类数组)通过为 C 中的每个类别绘制一个条形来绘制直方图。
histogram(C,Categories) 仅绘制 Categories 指定的类别的子集。
histogram(‘Categories’,Categories,’BinCounts’,counts) 手动指定类别和关联的 bin 计数。histogram 绘制指定的 bin 计数,而不执行任何数据的 bin 划分。
histogram(___,Name,Value) 使用前面的任何语法指定具有一个或多个 Name,Value 对组参数的其他选项。例如,可以指定 ‘BinWidth’ 和一个标量以调整 bin 的宽度,或指定 ‘Normalization’ 和一个有效选项(’count’、’probability’、’countdensity’、’pdf’、’cumcount’ 或 ‘cdf’)以使用不同类型的归一化。
histogram(ax,___) 将图形绘制到 ax 指定的坐标区中,而不是当前坐标区 (gca) 中。选项 ax 可以位于前面的语法中的任何输入参数组合之前。
h = histogram(___) 返回 Histogram 对象。使用此语法可检查并调整直方图的属性。
向量直方图
生成 10,000 个随机数并创建直方图。histogram 函数自动选择合适的 bin 数量,以便涵盖 x 中的值范围并显示基本分布的形状。
x = randn(10000,1);
h = histogram(x)
输出结果:
h =
Histogram (具有属性):
Data: [10000x1 double]
Values: [1x41 double]
NumBins: 41
BinEdges: [1x42 double]
BinWidth: 0.2000
BinLimits: [-4 4.2000]
Normalization: 'count'
FaceColor: 'auto'
EdgeColor: [0 0 0]
指定 histogram 函数的输出参数时,它返回一个二元直方图对象。可以使用该对象检查直方图的属性,例如 bin 数量或宽度。
计算直方图的 bin 数量。
nbins = h.NumBins
输出结果:
nbins = 41
指定分割区块(bin)数量
对分类为 25 个等距 bin 的 1,000 个随机数绘制直方图。
x = randn(1000,1);
nbins = 25;
h = histogram(x,nbins)
输出结果为:
h =
Histogram (具有属性):
Data: [1000x1 double]
Values: [1 1 2 11 17 26 37 61 72 101 102 106 92 106 74 65 48 31 23 11 7 1 3 0 2]
NumBins: 25
BinEdges: [1x26 double]
BinWidth: 0.2700
BinLimits: [-3.2000 3.5500]
Normalization: 'count'
FaceColor: 'auto'
EdgeColor: [0 0 0]
求 bin 计数。
counts = h.Values
输出结果为:
counts =
1 1 2 11 17 26 37 61 72 101 102 106 92 106 74 65 48 31 23 11 7 1 3 0 2
修改直方图的bin数量
生成 1,000 个随机数并创建直方图。
X = randn(1000,1);
h = histogram(X)
输出结果为:
h =
Histogram (具有属性):
Data: [1000x1 double]
Values: [3 4 8 19 29 63 63 85 104 125 130 95 83 72 41 33 25 11 4 1 1 0 1]
NumBins: 23
BinEdges: [1x24 double]
BinWidth: 0.3000
BinLimits: [-3.0000 3.9000]
Normalization: 'count'
FaceColor: 'auto'
EdgeColor: [0 0 0]
使用 morebins 函数粗略调整 bin 数量。
Nbins = morebins(h);
Nbins = morebins(h)
输出结果为:
Nbins =
29
通过显式设置 bin 数按精细颗粒级别调整 bin。
h.NumBins = 31;
指定直方图bin的边界
生成 1,000 个随机数并创建直方图。将 bin 边界指定为向量,使宽 bin 在直方图的两边,以捕获不满足 x<2 的离群值。第一个向量元素是第一个 bin 的左边界,而最后一个向量元素是最后一个 bin 的右边界。
x = randn(1000,1);
edges = [-10 -2:0.25:2 10];
h = histogram(x,edges);
将 Normalization 属性指定为 ‘countdensity’ 以使包含离群值的 bin 扁平化。现在,每个 bin 的区域(而不是高度)表示该 bin 的观测值频率。
h.Normalization = 'countdensity';
直方图数据分类绘制
创建一个表示投票的分类向量。该向量中的类别是 ‘yes’、’no’ 或 ‘undecided’。
A = [0 0 1 1 1 0 0 0 0 NaN NaN 1 0 0 0 1 0 1 0 1 0 0 0 1 1 1 1];
C = categorical(A,[1 0 NaN],{'yes','no','undecided'})
输出结果为:
C =
1 至 16 列
no no yes yes yes no no no no undecided undecided yes no no no yes
17 至 27 列
no yes no yes no no no yes yes yes yes
使用相对条形宽度 0.5 绘制投票的分类直方图。
h = histogram(C,'BarWidth',0.5)
输出结果为:
h =
Histogram (具有属性):
Data: [1x27 categorical]
Values: [11 14 2]
Categories: {'yes' 'no' 'undecided'}
Normalization: 'count'
DisplayStyle: 'bar'
FaceColor: 'auto'
EdgeColor: [0 0 0]
直方图的归一化
生成 1,000 个随机数并使用 ‘probability’ 归一化创建直方图。
x = randn(1000,1);
h = histogram(x,'Normalization','probability')
输出结果为:
h =
Histogram (具有属性):
Data: [1000x1 double]
Values: [1x22 double]
NumBins: 22
BinEdges: [1x23 double]
BinWidth: 0.3000
BinLimits: [-3.0000 3.6000]
Normalization: 'probability'
FaceColor: 'auto'
EdgeColor: [0 0 0]
计算条形高度的总和。通过该归一化,每个条形的高度等于在该 bin 间隔内选择观测值的概率,并且所有条形的高度总和为 1。
S = sum(h.Values)
输出结果为:
S =
1
多个直方图的绘制
生成两个随机数向量并在同一图窗中针对每个向量绘制对应的一个直方图。
x = randn(2000,1);
y = 1 + randn(5000,1);
h1 = histogram(x);
hold on
h2 = histogram(y);
由于直方图的示例大小和 bin 宽度不同,很难将它们进行比较。对这些直方图进行归一化,这样所有的条形高度相加的结果为 1 并使用统一的 bin 宽度。
h1.Normalization = 'probability';
h1.BinWidth = 0.25;
h2.Normalization = 'probability';
h2.BinWidth = 0.25;
直方图属性的调整
生成 1,000 个随机数并创建直方图。返回直方图对象以调整该直方图的属性,无需重新创建整个绘图。
x = randn(1000,1);
h = histogram(x)
输出结果为:
h =
Histogram (具有属性):
Data: [1000x1 double]
Values: [1 3 7 11 23 33 55 87 76 118 125 112 94 72 63 42 34 14 20 5 3 2]
NumBins: 22
BinEdges: [1x23 double]
BinWidth: 0.3000
BinLimits: [-3.3000 3.3000]
Normalization: 'count'
FaceColor: 'auto'
EdgeColor: [0 0 0]
准确指定要使用的 bin 数量。
h.NumBins = 15;
通过向量指定 bin 边界。向量中的第一个值是第一个 bin 的左边界。最后一个值是最后一个 bin 的右边界。
h.BinEdges = [-3:3];
更改直方图条形的颜色。
h.FaceColor = [0 0.5 0.5];
h.EdgeColor = 'r';
概率分布的确定
生成 5,000 个均值为 5、标准差为 2 的正态分布随机数。在 Normalization 设为 ‘pdf’ 的情况下绘制直方图可生成概率密度函数的估计值。
x = 2*randn(5000,1) + 5;
histogram(x,'Normalization','pdf')
在本示例中,已知正态分布数据的基本分布。但是,通过将它与已知的概率密度函数进行对比,可以使用 ‘pdf’ 直方图确定该数据的基础概率分布。
均值为 μ、标准差为 σ 以及方差为 σ2 的正态分布的概率密度函数是:(我这里不能输入公式,凑合着看吧。)
f(x,μ,σ)=1/[σ*sqrt(2)π]*exp[−(x−μ)2/2σ2].
对于均值为 5、标准差为 2 的正态分布,叠加一个概率密度函数图。
hold on
y = -5:0.1:15;
mu = 5;
sigma = 2;
f = exp(-(y-mu).^2./(2*sigma^2))./(sigma*sqrt(2*pi));
plot(y,f,'LineWidth',1.5)
直方图的保存与加载
使用 savefig 函数保存直方图。
y = histogram(randn(10));
savefig('histogram.fig');
clear all
close all
使用 openfig 重新将直方图加载到 MATLAB。openfig 也返回图窗 h 的句柄。
h = openfig('histogram.fig');
使用 findobj 函数从图窗句柄中查找正确的对象句柄。这样,您可以继续处理用于生成图窗的原始直方图对象。
y = findobj(h, 'type', 'histogram')
输出结果为:
y =
Histogram (具有属性):
Data: [10x10 double]
Values: [1 3 11 17 14 22 14 8 10]
NumBins: 9
BinEdges: [-2.5000 -2 -1.5000 -1 -0.5000 0 0.5000 1 1.5000 2]
BinWidth: 0.5000
BinLimits: [-2.5000 2]
Normalization: 'count'
FaceColor: 'auto'
EdgeColor: [0 0 0]
转载文章,原文出处:MathWorks官网,由古哥整理发布
如若转载,请注明出处:https://iymark.com/articles/668.html