AI NLP |使用语料库阅读器进行组块

NLP |使用语料库阅读器进行组块

原文:https://www . geesforgeks . org/NLP-chunking-use-corpus-reader/

什么是组块？ 这些是由单词组成的，单词的种类是使用词性标签定义的。一个人甚至可以定义一个模式或单词，它不能成为 chuck 的一部分，这样的单词被称为 chinks。ChunkRule 类指定在一个块中包含和排除哪些单词或模式。

工作原理:

ChunkedCorpusReader 类的工作方式与 TaggedCorpusReader 类似，用于获取标记令牌，此外它还提供了三种获取组块的新方法。
的一个实例代表每个块。
名词短语树看起来像树(' NP '，[…])，而句子级树看起来像树(' S '，[…])。
在 n 个 chunked_sents()中获得句子树的列表，每个名词短语作为句子的子树
在 chunked_words()中获得不在组块中的单词的标记标记旁边的名词短语树的列表。

列出主要方法的图表:

代码#1:为单词创建一个 ChunkedCorpusReader】

# Using ChunkedCorpusReader
from nltk.corpus.reader import ChunkedCorpusReader

# intitializing
x = ChunkedCorpusReader('.', r'.*\.chunk')

words = x.chunked_words()
print ("Words : \n", words)

输出:

Words : 
[Tree('NP', [('Earlier', 'JJR'), ('staff-reduction', 'NN'), 
('moves', 'NNS')]), ('have', 'VBP'), ...]

代码#2:用于句子

Chunked Sentence = x.chunked_sents()
print ("Chunked Sentence : \n", tagged_sent)

输出:

Chunked Sentence : 
[Tree('S', [Tree('NP', [('Earlier', 'JJR'), ('staff-reduction', 'NN'), 
('moves', 'NNS')]), ('have', 'VBP'), ('trimmed', 'VBN'), ('about', 'IN'), 
Tree('NP', [('300', 'CD'), ('jobs', 'NNS')]), (', ', ', '),
Tree('NP', [('the', 'DT'), ('spokesman', 'NN')]), ('said', 'VBD'), ('.', '.')])]

代码#3:针对段落

para = x.chunked_paras()()
print ("para : \n", para)

输出:

[[Tree('S', [Tree('NP', [('Earlier', 'JJR'), ('staff-reduction',
'NN'), ('moves', 'NNS')]), ('have', 'VBP'), ('trimmed', 'VBN'),
('about', 'IN'), 
Tree('NP', [('300', 'CD'), ('jobs', 'NNS')]), (', ', ', '), 
Tree('NP', [('the', 'DT'), ('spokesman', 'NN')]), ('said', 'VBD'), ('.', '.')])]]

版权属于：月萌API www.moonapi.com，转载请注明出处

本文链接：https://www.moonapi.com/news/13052.html

AI 查看更多书籍

《GeeksForGeeks 人工智能中文教程 2022-06-21》

分类

最近更新

AI NLP |使用语料库阅读器进行组块

NLP |使用语料库阅读器进行组块

留言

联系客服

数据知识

系统公告

开发文档

AI查看更多书籍

《GeeksForGeeks 人工智能中文教程 2022-06-21》

AI NLP |使用语料库阅读器进行组块

NLP |使用语料库阅读器进行组块

留言

联系客服

AI 查看更多书籍