Python

使用Python在句子列表中形成单词的二元组

发布于 2021-01-29 15:06:44

我有一个句子列表：

text = ['cant railway station','citadel hotel',' police stn'].

我需要形成双字母对，并将它们存储在变量中。问题是当我这样做时，我得到一对句子而不是单词。这是我所做的：

text2 = [[word for word in line.split()] for line in text]
bigrams = nltk.bigrams(text2)
print(bigrams)

产生

[(['cant', 'railway', 'station'], ['citadel', 'hotel']), (['citadel', 'hotel'], ['police', 'stn'])

火车站和城堡酒店不能合二为一。我想要的是

[([cant],[railway]),([railway],[station]),([citadel,hotel]), and so on...

第一个句子的最后一个单词不应与第二个句子的第一个单词合并。我应该怎么做才能使其正常工作？

关注者

被浏览

1 个回答

面试哥 2021-01-29

为面试而生，有面试问题，就找面试哥。

使用列表推导和zip：

>>> text = ["this is a sentence", "so is this one"]
>>> bigrams = [b for l in text for b in zip(l.split(" ")[:-1], l.split(" ")[1:])]
>>> print(bigrams)
[('this', 'is'), ('is', 'a'), ('a', 'sentence'), ('so', 'is'), ('is', 'this'), ('this',     
'one')]

知识点

Python

面圈网VIP题库全新上线，海量真题题库资源。 90大类考试，超10万份考试真题开放下载啦

去下载看看