今天,带来Matlab中关于文本数据创建文字云图的wordcloud函数相关用法,该函数在Matlab2020a中引入,由于我用的是Matlab2016版本,这里就不对代码进行试验了。直接把官方给的代码实例,分享给各位,以供参考。本文主要介绍wordcloud函数的常见用法、语法说明、使用表创建文字云、准备文本数据以创建文字云、指定文本大小、指定文本颜色、使用 Text Analytics Toolbox 创建文字云。
由于我用的是Matlab2016,这里就不放帮助文本了,安装Matlab2020a以上版本的,可以运行help wordcloud自行查看帮助文本。
常见用法
wordcloud(tbl,wordVar,sizeVar)
wordcloud(words,sizeData)
wordcloud(C)
wordcloud(___,Name,Value)
wordcloud(parent,___)
wc = wordcloud(___)
语法说明
wordcloud(tbl,wordVar,sizeVar) 根据表 tbl 创建文字云图。表中的变量 wordVar 和 sizeVar 分别指定单词和单词大小。
wordcloud(words,sizeData) 使用 words 的元素(单词大小由 SizeData 指定)创建文字云图。
wordcloud(C) 根据分类数组 C 的唯一元素创建文字云图,大小与这些元素的频率计数对应。如果您拥有 Text Analytics Toolbox™,则 C 可以是字符串数组、字符向量或字符向量元胞数组。
wordcloud(___,Name,Value) 使用一个或多个名称-值对组参数指定其他 WordCloudChart 属性。
wordcloud(parent,___) 在由 parent 指定的图窗、面板或选项卡上创建文字云。
wc = wordcloud(___) 返回 HeatmapChart 对象。创建文字云后,使用 wc 修改其属性。
注意
Text Analytics Toolbox 扩展了 wordcloud (MATLAB®) 函数的功能。它增加了直接使用字符串数组创建文字云的支持,还支持使用词袋模型、N 元词袋模型和 LDA 主题创建文字云。
使用表创建文字云
加载示例数据 sonnetsTable。表 tbl 将单词列表包含在变量 Word 中,将相应的频率计数包含在变量 Count 中。
load sonnetsTable
head(tbl)
运行结果为:
ans=8×2 table
Word Count
___________ _____
{'''tis' } 1
{''Amen'' } 1
{''Fair' } 2
{''Gainst'} 1
{''Since' } 1
{''This' } 2
{''Thou' } 1
{''Thus' } 1
使用 wordcloud 绘制表数据。将单词和相应的单词大小分别指定为 Word 和 Count 变量。
figure
wordcloud(tbl,'Word','Count');
title("Sonnets Word Cloud")
准备文本数据以创建文字云
如果您安装了 Text Analytics Toolbox™,则可以直接使用字符串数组创建文字云。有关详细信息,请参阅 wordcloud (Text Analytics Toolbox) (Text Analytics Toolbox)。如果您没有 Text Analytics Toolbox,则必须手动预处理文本数据。
此示例说明如何通过将纯文本读入字符串数组、进行预处理并传递给 wordcloud 函数,使用纯文本创建文字云。
使用 fileread 函数从莎士比亚的十四行诗中读取文本并将其转换为字符串。
sonnets = string(fileread('sonnets.txt'));
extractBefore(sonnets,"II")
运行结果为:
ans =
"THE SONNETS
by William Shakespeare
I
From fairest creatures we desire increase,
That thereby beauty's rose might never die,
But as the riper should by time decease,
His tender heir might bear his memory:
But thou, contracted to thine own bright eyes,
Feed'st thy light's flame with self-substantial fuel,
Making a famine where abundance lies,
Thy self thy foe, to thy sweet self too cruel:
Thou that art now the world's fresh ornament,
And only herald to the gaudy spring,
Within thine own bud buriest thy content,
And tender churl mak'st waste in niggarding:
Pity the world, or else this glutton be,
To eat the world's due, by the grave and thee.
"
将 sonnets 拆分为其元素包含单个单词的字符串数组。要完成此操作,需要删除所有标点字符,将所有字符串元素合并成一个 1×1 字符串,然后在空白字符处进行拆分。然后,删除少于五个字符的单词并将单词转换为小写。
punctuationCharacters = ["." "?" "!" "," ";" ":"];
sonnets = replace(sonnets,punctuationCharacters," ");
words = split(join(sonnets));
words(strlength(words)<5) = [];
words = lower(words);
words(1:10)
运行结果为:
ans = 10x1 string
"sonnets"
"william"
"shakespeare"
"fairest"
"creatures"
"desire"
"increase"
"thereby"
"beauty's"
"might"
将 sonnets 转换为分类数组,然后使用 wordcloud 进行绘图。此函数绘制 C 的唯一元素,大小与这些元素的频率计数对应。
C = categorical(words);
figure
wordcloud(C);
title("Sonnets Word Cloud")
指定单词大小
通过将纯文本读入一个字符串数组,对其进行预处理并传递给 wordcloud 函数,即可从纯文本创文字云。
使用 fileread 函数从莎士比亚的十四行诗中读取文本并将其转换为字符串。
sonnets = string(fileread('sonnets.txt'));
extractBefore(sonnets,"II")
运行结果为:
ans =
"THE SONNETS
by William Shakespeare
I
From fairest creatures we desire increase,
That thereby beauty's rose might never die,
But as the riper should by time decease,
His tender heir might bear his memory:
But thou, contracted to thine own bright eyes,
Feed'st thy light's flame with self-substantial fuel,
Making a famine where abundance lies,
Thy self thy foe, to thy sweet self too cruel:
Thou that art now the world's fresh ornament,
And only herald to the gaudy spring,
Within thine own bud buriest thy content,
And tender churl mak'st waste in niggarding:
Pity the world, or else this glutton be,
To eat the world's due, by the grave and thee.
"
将 sonnets 拆分为其元素包含单个单词的字符串数组。要完成此操作,需要删除所有标点字符,将所有字符串元素合并成一个 1×1 字符串,然后在空白字符处进行拆分。然后,删除少于五个字符的单词并将单词转换为小写。
punctuationCharacters = ["." "?" "!" "," ";" ":"];
sonnets = replace(sonnets,punctuationCharacters," ");
words = split(join(sonnets));
words(strlength(words)<5) = [];
words = lower(words);
words(1:10)
运行结果为:
ans = <em>10x1 string</em>
"sonnets"
"william"
"shakespeare"
"fairest"
"creatures"
"desire"
"increase"
"thereby"
"beauty's"
"might"
查找 sonnets 中的唯一单词并计算它们出现的频率。使用频率计数作为大小数据创建文字云。
[numOccurrences,uniqueWords] = histcounts(categorical(words));
figure
wordcloud(uniqueWords,numOccurrences);
title("Sonnets Word Cloud")
指定单词颜色
加载示例数据 sonnetsTable。表 tbl 将单词列表包含在变量 Word 中,将相应的频率计数包含在变量 Count 中。
load sonnetsTable
head(tbl)
运行结果为:
ans=8×2 table
Word Count
___________ _____
{'''tis' } 1
{''Amen'' } 1
{''Fair' } 2
{''Gainst'} 1
{''Since' } 1
{''This' } 2
{''Thou' } 1
{''Thus' } 1
使用 wordcloud 绘制表数据。将单词和相应的单词大小分别指定为 Word 和 Count 变量。要将单词颜色设置为随机值,请将 ‘Color’ 设置为随机矩阵或 RGB 三元组,每一行对应一个单词。
numWords = size(tbl,1);
colors = rand(numWords,3);
figure
wordcloud(tbl,'Word','Count','Color',colors);
title("Sonnets Word Cloud")
使用 Text Analytics Toolbox 创建文字云
如果您安装了 Text Analytics Toolbox,则可以直接使用字符串数组创建文字云。如果您没有 Text Analytics Toolbox,则必须手动预处理文本数据。
使用 extractFileText 从 sonnets.txt 中提取文本。
str = extractFileText("sonnets.txt");
extractBefore(str,"II")
运行结果为:
ans =
"THE SONNETS
by William Shakespeare
I
From fairest creatures we desire increase,
That thereby beauty's rose might never die,
But as the riper should by time decease,
His tender heir might bear his memory:
But thou, contracted to thine own bright eyes,
Feed'st thy light's flame with self-substantial fuel,
Making a famine where abundance lies,
Thy self thy foe, to thy sweet self too cruel:
Thou that art now the world's fresh ornament,
And only herald to the gaudy spring,
Within thine own bud buriest thy content,
And tender churl mak'st waste in niggarding:
Pity the world, or else this glutton be,
To eat the world's due, by the grave and thee.
"
在文字云中显示十四行诗中的单词。
figure
wordcloud(str);
转载文章,原文出处:MathWorks官网,由古哥整理发布
如若转载,请注明出处:https://iymark.com/articles/868.html