Matlab数据处理之直方图bin计数函数histcounts

4.8
(4)

Matlab中,可以使用histcounts函数获取数组元素的直方图bin计数。本文,将从以下几个方面介绍histcounts函数:histcounts函数常见用法、histcounts函数用法说明、histcounts函数实例。其中,histcounts函数实例包括:bin 计数和 bin 边界、指定 bin 数、指定 bin 边界、归一化的 bin 计数、确定 bin 放置、分类 bin 计数。

Matlab数据处理之直方图bin计数函数histcounts

histcounts函数帮助文档如下:

>> help histcounts
 histcounts  Histogram Bin Counts.
    [N,EDGES] = histcounts(X) partitions the values in X into bins, and 
    returns the count in each bin, as well as the bin edges. histcounts
    determines the bin edges using an automatic binning algorithm that 
    returns uniform bins of a width that is chosen to cover the range of 
    values in X and reveal the shape of the underlying distribution. 
 
    N(k) will count the value X(i) if EDGES(k) <= X(i) < EDGES(k+1). The 
    last bin will also include the right edge such that N(end) will count
    X(i) if EDGES(end-1) <= X(i) <= EDGES(end).
 
    [N,EDGES] = histcounts(X,M), where M is a scalar, uses M bins.
 
    [N,EDGES] = histcounts(X,EDGES), where EDGES is a vector, specifies the 
    edges of the bins.
 
    [N,EDGES] = histcounts(...,'BinWidth',BW) uses bins of width BW. To 
    prevent from accidentally creating too many bins, a limit of 65536 bins
    can be created when specifying 'BinWidth'. If BW is too small such that 
    more than 65536 bins are needed, histcounts uses wider bins instead.
 
    [N,EDGES] = histcounts(...,'BinLimits',[BMIN,BMAX]) bins only elements 
    in X between BMIN and BMAX inclusive, X(X>=BMIN & X<=BMAX).
 
    [N,EDGES] = histcounts(...,'Normalization',NM) specifies the
    normalization scheme of the histogram values returned in N. NM can be:
                   'count'   Each N value is the number of observations in 
                             each bin, and SUM(N) is equal to NUMEL(X).
                             This is the default.
             'probability'   Each N value is the relative number of 
                             observations (number of observations in bin / 
                             total number of observations), and SUM(N) is  
                             equal to 1.
            'countdensity'   Each N value is the number of observations in 
                             each bin divided by the width of the bin. 
                     'pdf'   Probability density function estimate. Each N 
                             value is, (number of observations in bin) / 
                             (total number of observations * width of bin).
                'cumcount'   Each N value is the cumulative number of 
                             observations in each bin and all previous bins. 
                             N(end) is equal to NUMEL(X).
                     'cdf'   Cumulative density function estimate. Each N 
                             value is the cumulative relative number of 
                             observations in each bin and all previous bins. 
                             N(end) is equal to 1.
 
    [N,EDGES] = histcounts(...,'BinMethod',BM), uses the specified automatic 
    binning algorithm to determine the number and width of the bins.  BM can be:
                    'auto'   The default 'auto' algorithm chooses a bin 
                             width to cover the data range and reveal the 
                             shape of the underlying distribution.
                   'scott'   Scott's rule is optimal if X is close to being 
                             normally distributed, but is also appropriate
                             for most other distributions. It uses a bin width
                             of 3.5*STD(X(:))*NUMEL(X)^(-1/3).
                      'fd'   The Freedman-Diaconis rule is less sensitive to 
                             outliers in the data, and may be more suitable 
                             for data with heavy-tailed distributions. It 
                             uses a bin width of 2*IQR(X(:))*NUMEL(X)^(-1/3), 
                             where IQR is the interquartile range.
                'integers'   The integer rule is useful with integer data, 
                             as it creates a bin for each integer. It uses 
                             a bin width of 1 and places bin edges halfway 
                             between integers. To prevent from accidentally 
                             creating too many bins, a limit of 65536 bins 
                             can be created with this rule. If the data 
                             range is greater than 65536, then wider bins 
                             are used instead.
                 'sturges'   Sturges' rule is a simple rule that is popular
                             due to its simplicity. It chooses the number of
                             bins to be CEIL(1 + LOG2(NUMEL(X))).
                    'sqrt'   The Square Root rule is another simple rule 
                             widely used in other software packages. It 
                             chooses the number of bins to be 
                             CEIL(SQRT(NUMEL(X))).
 
    [N,EDGES,BIN] = histcounts(...) also returns an index array BIN, using 
    any of the previous syntaxes. BIN is an array of the same size as X 
    whose elements are the bin indices for the corresponding elements in X. 
    The number of elements in the kth bin is NNZ(BIN==k), which is the same 
    as N(k). A value of 0 in BIN indicates an element which does not belong 
    to any of the bins (for example, a NaN value).

histcounts函数常见用法

[N,edges] = histcounts(X)
[N,edges] = histcounts(X,nbins)
[N,edges] = histcounts(X,edges)
[N,edges,bin] = histcounts(___)
N = histcounts(C)
N = histcounts(C,Categories)
[N,Categories] = histcounts(___)
[___] = histcounts(___,Name,Value)

histcounts函数用法说明

[N,edges] = histcounts(X) 将 X 的值划分为多个 bin,并返回每个 bin 中的计数以及 bin 边界。histcounts 函数使用自动分 bin 算法,返回均匀宽度的 bin,这些 bin 可涵盖 X 中的元素范围并显示基本分布的形状。

[N,edges] = histcounts(X,nbins) 使用标量 nbins 指定的 bin 数量。

[N,edges] = histcounts(X,edges) 将 X 划分为由向量 edges 来指定 bin 边界的 bin。如果 edges(k) ≤ X(i) < edges(k+1),值 X(i) 位于第 k 个 bin 中。最后一个 bin 也包含 bin 的右边界,这样如果 edges(end-1) ≤ X(i) ≤ edges(end),它包含 X(i)。

[N,edges,bin] = histcounts(_) 还使用以前的任何语法返回索引数组 bin。bin 是大小与 X 相同的数组,其元素是 X 中的对应元素的 bin 索引。第 k 个 bin 中的元素数量是 nnz(bin==k),与 N(k) 相同。

N = histcounts(C)(其中 C 是分类数组)返回向量 N,该向量指示 C 中其值等于 C 的各类别的元素的数量。C 中的每个类别在 N 中都有一个对应元素。

N = histcounts(C,Categories) 仅对 C 中其值等于由 Categories 指定的类别子集的元素进行计数。

[N,Categories] = histcounts(_) 还使用分类数组的任一上述语法,返回对应于 N 中每个计数的类别。

[] = histcounts(,Name,Value) 使用一个或多个 Name,Value 对组指定的其他选项,这些选项使用上述语法中的任意输入或输出参数组合。例如,您可以指定 ‘BinWidth’ 和一个标量来调整数值数据的 bin 的宽度。对于分类数据,您可以指定 ‘Normalization’ 和下列项之一:’count’、’countdensity’、’probability’、’pdf’、’cumcount’ 或 ‘cdf’。

histcounts函数实例

bin 计数和 bin 边界

将 100 个随机值分布到多个 bin 内。histcounts 自动选择合适的 bin 宽度以显示数据的基本分布。

>> X = randn(100,1);
>> [N,edges] = histcounts(X)

N =

     2    17    28    32    16     3     2


edges =

    -3    -2    -1     0     1     2     3     4

指定 bin 数

将 10 个随机数分布到 6 个等间距 bin 内。

>> X = [2 3 5 7 11 13 17 19 23 29];
>> [N,edges] = histcounts(X,6)

N =

     2     2     2     2     1     1


edges =

         0    4.9000    9.8000   14.7000   19.6000   24.5000   29.4000
Matlab数据处理之直方图bin计数函数histcounts

指定 bin 边界

将 1,000 个随机数分布到多个 bin 内。通过向量定义 bin 边界,其中第一个元素是第一个 bin 的左边界,而最后一个元素是最后一个 bin 的右边界。

>> X = randn(1000,1);
>> edges = [-5 -4 -2 -1 -0.5 0 0.5 1 2 4 5];
>> N = histcounts(X,edges)

N =

     0    24   145   145   194   204   154   110    24     0

归一化的 bin 计数

将小于 100 的所有质数分布到多个 bin 内。将 'Normalization' 指定为 'probability' 以对 bin 计数进行归一化,从而 sum(N) 为 1。即,每个 bin 计数代表观测值属于该 bin 的可能性。

>> X = primes(100);
>> [N,edges] = histcounts(X, 'Normalization', 'probability')

N =

    0.4000    0.2800    0.2800    0.0400


edges =

     0    30    60    90   120
Matlab数据处理之直方图bin计数函数histcounts

确定 bin 放置

将介于 -5 和 5 之间的 100 个随机整数分布到多个 bin 内,并将 'BinMethod' 指定为 'integers' 以使用以整数为中心的单位宽度 bin。指定 histcounts 的第三个输出以返回代表数据 bin 索引的向量。通过计算数字 3 在 bin 索引向量 bin 中的出现次数求第三个 bin 的 bin 计数。结果与 N(3) 相同。

>> X = randi([-5,5],100,1);
>> [N,edges,bin] = histcounts(X,'BinMethod','integers');
>> count = nnz(bin==3)

count =

     7

分类 bin 计数

创建一个表示投票的分类向量。该向量中的类别是 'yes''no' 或 'undecided'

>> A = [0 0 1 1 1 0 0 0 0 NaN NaN 1 0 0 0 1 0 1 0 1 0 0 0 1 1 1 1];
C = categorical(A,[1 0 NaN],{'yes','no','undecided'})

C = 

  1 至 13 列

     no      no      yes      yes      yes      no      no      no      no      undecided      undecided      yes      no 

  14 至 27 列

     no      no      yes      no      yes      no      yes      no      no      no      yes      yes      yes      yes 
Matlab数据处理之直方图bin计数函数histcounts

确定每个类别中的元素数量。

>> [N,Categories] = histcounts(C)

N =

    11    14     2


Categories = 

    'yes'    'no'    'undecided'

共计4人评分,平均4.8

到目前为止还没有投票~

很抱歉,这篇文章对您没有用!

让我们改善这篇文章!

告诉我们我们如何改善这篇文章?

文章目录

原创文章,作者:古哥,转载需经过作者授权同意,并附上原文链接:https://iymark.com/articles/3947.html

(0)
微信公众号
古哥的头像古哥管理团队
上一篇 2023年01月02日 18:56
下一篇 2023年01月03日 19:57

你可能感兴趣的文章

发表回复

登录后才能评论
微信小程序
微信公众号