Matlab中,可以使用histcounts函数获取数组元素的直方图bin计数。本文,将从以下几个方面介绍histcounts函数:histcounts函数常见用法、histcounts函数用法说明、histcounts函数实例。其中,histcounts函数实例包括:bin 计数和 bin 边界、指定 bin 数、指定 bin 边界、归一化的 bin 计数、确定 bin 放置、分类 bin 计数。
histcounts函数帮助文档如下:
>> help histcounts
histcounts Histogram Bin Counts.
[N,EDGES] = histcounts(X) partitions the values in X into bins, and
returns the count in each bin, as well as the bin edges. histcounts
determines the bin edges using an automatic binning algorithm that
returns uniform bins of a width that is chosen to cover the range of
values in X and reveal the shape of the underlying distribution.
N(k) will count the value X(i) if EDGES(k) <= X(i) < EDGES(k+1). The
last bin will also include the right edge such that N(end) will count
X(i) if EDGES(end-1) <= X(i) <= EDGES(end).
[N,EDGES] = histcounts(X,M), where M is a scalar, uses M bins.
[N,EDGES] = histcounts(X,EDGES), where EDGES is a vector, specifies the
edges of the bins.
[N,EDGES] = histcounts(...,'BinWidth',BW) uses bins of width BW. To
prevent from accidentally creating too many bins, a limit of 65536 bins
can be created when specifying 'BinWidth'. If BW is too small such that
more than 65536 bins are needed, histcounts uses wider bins instead.
[N,EDGES] = histcounts(...,'BinLimits',[BMIN,BMAX]) bins only elements
in X between BMIN and BMAX inclusive, X(X>=BMIN & X<=BMAX).
[N,EDGES] = histcounts(...,'Normalization',NM) specifies the
normalization scheme of the histogram values returned in N. NM can be:
'count' Each N value is the number of observations in
each bin, and SUM(N) is equal to NUMEL(X).
This is the default.
'probability' Each N value is the relative number of
observations (number of observations in bin /
total number of observations), and SUM(N) is
equal to 1.
'countdensity' Each N value is the number of observations in
each bin divided by the width of the bin.
'pdf' Probability density function estimate. Each N
value is, (number of observations in bin) /
(total number of observations * width of bin).
'cumcount' Each N value is the cumulative number of
observations in each bin and all previous bins.
N(end) is equal to NUMEL(X).
'cdf' Cumulative density function estimate. Each N
value is the cumulative relative number of
observations in each bin and all previous bins.
N(end) is equal to 1.
[N,EDGES] = histcounts(...,'BinMethod',BM), uses the specified automatic
binning algorithm to determine the number and width of the bins. BM can be:
'auto' The default 'auto' algorithm chooses a bin
width to cover the data range and reveal the
shape of the underlying distribution.
'scott' Scott's rule is optimal if X is close to being
normally distributed, but is also appropriate
for most other distributions. It uses a bin width
of 3.5*STD(X(:))*NUMEL(X)^(-1/3).
'fd' The Freedman-Diaconis rule is less sensitive to
outliers in the data, and may be more suitable
for data with heavy-tailed distributions. It
uses a bin width of 2*IQR(X(:))*NUMEL(X)^(-1/3),
where IQR is the interquartile range.
'integers' The integer rule is useful with integer data,
as it creates a bin for each integer. It uses
a bin width of 1 and places bin edges halfway
between integers. To prevent from accidentally
creating too many bins, a limit of 65536 bins
can be created with this rule. If the data
range is greater than 65536, then wider bins
are used instead.
'sturges' Sturges' rule is a simple rule that is popular
due to its simplicity. It chooses the number of
bins to be CEIL(1 + LOG2(NUMEL(X))).
'sqrt' The Square Root rule is another simple rule
widely used in other software packages. It
chooses the number of bins to be
CEIL(SQRT(NUMEL(X))).
[N,EDGES,BIN] = histcounts(...) also returns an index array BIN, using
any of the previous syntaxes. BIN is an array of the same size as X
whose elements are the bin indices for the corresponding elements in X.
The number of elements in the kth bin is NNZ(BIN==k), which is the same
as N(k). A value of 0 in BIN indicates an element which does not belong
to any of the bins (for example, a NaN value).
histcounts函数常见用法
[N,edges] = histcounts(X)
[N,edges] = histcounts(X,nbins)
[N,edges] = histcounts(X,edges)
[N,edges,bin] = histcounts(___)
N = histcounts(C)
N = histcounts(C,Categories)
[N,Categories] = histcounts(___)
[___] = histcounts(___,Name,Value)
histcounts函数用法说明
[N,edges] = histcounts(X) 将 X 的值划分为多个 bin,并返回每个 bin 中的计数以及 bin 边界。histcounts 函数使用自动分 bin 算法,返回均匀宽度的 bin,这些 bin 可涵盖 X 中的元素范围并显示基本分布的形状。
[N,edges] = histcounts(X,nbins) 使用标量 nbins 指定的 bin 数量。
[N,edges] = histcounts(X,edges) 将 X 划分为由向量 edges 来指定 bin 边界的 bin。如果 edges(k) ≤ X(i) < edges(k+1),值 X(i) 位于第 k 个 bin 中。最后一个 bin 也包含 bin 的右边界,这样如果 edges(end-1) ≤ X(i) ≤ edges(end),它包含 X(i)。
[N,edges,bin] = histcounts(_) 还使用以前的任何语法返回索引数组 bin。bin 是大小与 X 相同的数组,其元素是 X 中的对应元素的 bin 索引。第 k 个 bin 中的元素数量是 nnz(bin==k),与 N(k) 相同。
N = histcounts(C)(其中 C 是分类数组)返回向量 N,该向量指示 C 中其值等于 C 的各类别的元素的数量。C 中的每个类别在 N 中都有一个对应元素。
N = histcounts(C,Categories) 仅对 C 中其值等于由 Categories 指定的类别子集的元素进行计数。
[N,Categories] = histcounts(_) 还使用分类数组的任一上述语法,返回对应于 N 中每个计数的类别。
[] = histcounts(,Name,Value) 使用一个或多个 Name,Value 对组指定的其他选项,这些选项使用上述语法中的任意输入或输出参数组合。例如,您可以指定 ‘BinWidth’ 和一个标量来调整数值数据的 bin 的宽度。对于分类数据,您可以指定 ‘Normalization’ 和下列项之一:’count’、’countdensity’、’probability’、’pdf’、’cumcount’ 或 ‘cdf’。
histcounts函数实例
bin 计数和 bin 边界
将 100 个随机值分布到多个 bin 内。histcounts
自动选择合适的 bin 宽度以显示数据的基本分布。
>> X = randn(100,1);
>> [N,edges] = histcounts(X)
N =
2 17 28 32 16 3 2
edges =
-3 -2 -1 0 1 2 3 4
指定 bin 数
将 10 个随机数分布到 6 个等间距 bin 内。
>> X = [2 3 5 7 11 13 17 19 23 29];
>> [N,edges] = histcounts(X,6)
N =
2 2 2 2 1 1
edges =
0 4.9000 9.8000 14.7000 19.6000 24.5000 29.4000
指定 bin 边界
将 1,000 个随机数分布到多个 bin 内。通过向量定义 bin 边界,其中第一个元素是第一个 bin 的左边界,而最后一个元素是最后一个 bin 的右边界。
>> X = randn(1000,1);
>> edges = [-5 -4 -2 -1 -0.5 0 0.5 1 2 4 5];
>> N = histcounts(X,edges)
N =
0 24 145 145 194 204 154 110 24 0
归一化的 bin 计数
将小于 100 的所有质数分布到多个 bin 内。将 'Normalization'
指定为 'probability'
以对 bin 计数进行归一化,从而 sum(N)
为 1
。即,每个 bin 计数代表观测值属于该 bin 的可能性。
>> X = primes(100);
>> [N,edges] = histcounts(X, 'Normalization', 'probability')
N =
0.4000 0.2800 0.2800 0.0400
edges =
0 30 60 90 120
确定 bin 放置
将介于 -5 和 5 之间的 100 个随机整数分布到多个 bin 内,并将 'BinMethod'
指定为 'integers'
以使用以整数为中心的单位宽度 bin。指定 histcounts
的第三个输出以返回代表数据 bin 索引的向量。通过计算数字 3
在 bin 索引向量 bin
中的出现次数求第三个 bin 的 bin 计数。结果与 N(3)
相同。
>> X = randi([-5,5],100,1);
>> [N,edges,bin] = histcounts(X,'BinMethod','integers');
>> count = nnz(bin==3)
count =
7
分类 bin 计数
创建一个表示投票的分类向量。该向量中的类别是 'yes'
、'no'
或 'undecided'
。
>> A = [0 0 1 1 1 0 0 0 0 NaN NaN 1 0 0 0 1 0 1 0 1 0 0 0 1 1 1 1];
C = categorical(A,[1 0 NaN],{'yes','no','undecided'})
C =
1 至 13 列
no no yes yes yes no no no no undecided undecided yes no
14 至 27 列
no no yes no yes no yes no no no yes yes yes yes
确定每个类别中的元素数量。
>> [N,Categories] = histcounts(C)
N =
11 14 2
Categories =
'yes' 'no' 'undecided'
原创文章,作者:古哥,转载需经过作者授权同意,并附上原文链接:https://iymark.com/articles/3947.html