ID : 39
viewed : 180
Tags : PythonPython String
96
This tutorial will discuss the methods to split a sentence into a list of words in Python.
str.split()
Function in PythonThe in Python takes a separator as an input parameter and splits the calling string into multiple strings based on the separator. If we don’t specify any separator, the str.split()
function splits the string on the basis of empty spaces. The following code snippet shows us how to split a sentence into a list of words with the str.split()
function.
sentence = "This is a sentence" words = sentence.split() print(words)
Output:
['This', 'is', 'a', 'sentence']
We declared a string variable sentence
that contains some data. We then split the sentence
variable into a list of strings with the sentence.split()
function and stored the results into the words
list. The str.split()
function is the easiest way to convert a sentence into a list of words in Python.
We can also use list comprehensions to split a sentence into a list of words. However, this approach isn’t as straightforward as the str.split()
function. The advantage of using list comprehensions is that we can also perform some operations on the obtained words. The operations could range from appending something to each word or removing something from each word. The following code snippet shows us how to split a sentence into words with list comprehensions and the str.split()
function.
sentence = "This is a sentence" words = [word for word in sentence.split()] print(words)
Output:
['This', 'is', 'a', 'sentence']
We declared a string variable sentence
that contains some data. We then split the sentence
variable into a list of strings with list comprehension and stored the results into the words
list. This method is useful to modify each obtained word before storing the word into the words
list.
nltk
Library in PythonThe or the Natural language toolkit library is used for text processing in Python. We have to install it before using it because it is an external library. The command to install the natural language toolkit is given below.
pip install nltk
Once installed, we have to download the punkt
package using the nltk.download()
function. This phenomenon is illustrated in the following code snippet.
import nltk nltk.download('punkt')
Output:
[nltk_data] Downloading package punkt to /root/nltk_data... [nltk_data] Unzipping tokenizers/punkt.zip.
The inside the nltk
library can be used to solve our specific problem. This function takes a string as an input parameter and splits it into multiple substrings. The following code snippet shows us how to split a sentence into a list of words with the nltk.word_tokenize()
function.
sentence = "This is a sentence" words = nltk.word_tokenize(sentence) print(words)
Output:
['This', 'is', 'a', 'sentence']
We split our sentence
string into a list of words with the nltk.word_tokenize(sentence)
function and stored the results into the words
list. In the end, we displayed the elements inside the words
list.
The str.split()
method is the simplest way to solve this specific problem, but there isn’t much that we can do with the split()
function once we have the list of words. The other two methods are useful when we want to perform some additional manipulation on the obtained words.