Python - How To Remove Punctuation From Python List

ID : 38

viewed : 253

Tags : PythonPython ListPython String

vote vote

93

This tutorial will introduce the string constant, string.punctuation, and discuss some methods to remove punctuation signs from a list of strings in Python.

the string.punctuation Constant in Python

The is a pre-initialized string in Python that contains all punctuation marks. To use this string, we have to import the string module. The string.punctuation constant is shown in the following coding example.

import string print(string.punctuation) 

Output:

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ 

We imported the string module and displayed the value of string.punctuation constant. The output shows all possible punctuation marks in the English language.

Remove Punctuations From a List With for Loops in Python

We can remove all the punctuation marks from a list of strings by using the string.punctuation with for loops in Python. The following code example demonstrates this phenomenon.

import string words = ["hell'o", "Hi,", "bye bye", "good bye", ""] new_words = [] for word in words:     for letter in word:         if letter in string.punctuation:             word = word.replace(letter,"")        new_words.append(word)  print(new_words) 

Output:

['hello', 'Hi', 'bye bye', 'good bye', ''] 

We initialized a list of strings words that contains punctuation signs. We then created a nested loop that iterates through each character in each string of the words list. The outer for loop iterates through each string inside the list, and the inner for loop iterates through each letter of that string. We then checked whether that letter is inside the string.punctuation constant with the if statement. If the letter is present inside the string.punctuation constant, we remove it by replacing it with an empty string. After removing all punctuation signs from a string, we append that string into our new_words list. In the end, we printed the new_words list.

The only problem with this implementation is that it allows empty strings to remain inside the final list. Depending upon our requirements, we can also remove the empty strings from the original list by placing an additional check inside our loops. The following code snippet shows how to remove empty strings from the list as well.

import string words = ["hell'o", "Hi,", "bye bye", "good bye", ""] new_words = [] for word in words:     if word == "":         words.remove(word)     else:         for letter in word:             if letter in string.punctuation:                 word = word.replace(letter,"")            new_words.append(word) print(new_words) 

Output:

['hello', 'Hi', 'bye bye', 'good bye'] 

This time around, our code also removed any empty strings from the original string.

Remove Punctuations From a List With List Comprehensions in Python

The only problem with the previous approach is that it requires too much code for just a simple task of removing punctuations from a list of strings. List comprehensions are a way to perform different computational operations on list elements. We can use for loops and if statements inside list comprehensions. The main advantage of using list comprehensions is that they require less code and are generally faster than a simple for loop. We can use list comprehensions with the string.punctuation string constant to remove punctuation signs from a list of strings in Python. The following code example shows us how to remove punctuations from a list with list comprehensions.

import string words = ["hell'o", "Hi,", "bye bye", "good bye", ""] words = [''.join(letter for letter in word if letter not in string.punctuation) for word in words] print(words) 

Output:

['hello', 'Hi', 'bye bye', 'good bye', ''] 

I’ll be completely honest, It is a little hard to understand the above code, but it isn’t complex at all. It is simply using a nested list comprehension. The inner part of the code checks whether each letter inside a single word is present in the string.punctuation constant and only returns those letters not in string.punctuation. The str.join() function enclosing this part of the code joins all the returned letters with an empty string and gives us a complete word without any punctuation signs. The outer part runs this inner list comprehension for each word inside our words list. We store the words returned by the outer list comprehension into the words list. In the end, we display all the elements of the words list.

Another advantage of using list comprehensions is that we save space on the RAM, i.e., throughout our code, we have updated the original list instead of creating a new list for storing the results. We can also remove empty strings from the original list by placing an extra if statement in the outer list comprehension.

import string words = ["hell'o", "Hi,", "bye bye", "good bye", ""] words = [''.join(letter for letter in word if letter not in string.punctuation) for word in words if word] print(words) 

Output:

['hello', 'Hi', 'bye bye', 'good bye'] 

This time, our outer list comprehension does not run the inner list comprehension when there is no element in the word. With this approach, we don’t get an empty string in the resultant list of strings.

Remove Punctuations From a List With the str.translate() Function in Python

Our previous implementation is good as it requires lesser code and is faster than using traditional loops, but it can be better. Although it is less code, the code is a little complex. The fastest and the most efficient way to remove punctuations from a list of strings in Python is the str.translate() function. It requires lesser code than the list comprehensions and is much faster. The maps each character inside a string according to a translation table. In our case, it will map all the letters in string.punctuation to an empty string. The following code example shows us how to remove punctuation signs from a list with the str.translate() function.

import string words = ["hell'o", "Hi,", "bye bye", "good bye", ""] words = [word.translate(string.punctuation) for word in words] print(words) 

Output:

["hell'o", 'Hi,', 'bye bye', 'good bye', ''] 

We used str.translate() function with string.punctuation constant and list comprehensions to remove punctuation signs from our words list. The word.translate(string.punctuation) maps each letter in the string.punctuation constant to an empty string, and the list comprehension runs this code for each string in the words list and returns the results. We assign all the returned strings to the words list and display the output.

The output shows an empty string in the results. To further remove this empty string, we have to place an additional condition inside our list comprehension.

import string words = ["hell'o", "Hi,", "bye bye", "good bye", ""] words = [word.translate(string.punctuation) for word in words if word] print(words) 

Output:

["hell'o", 'Hi,', 'bye bye', 'good bye'] 

We removed the empty string from the previous result with just one more condition.

The string.punctuation is a pre-defined constant string that contains all the possible punctuation signs. Multiple methods use this string constant to remove punctuations from a list of strings, but the easiest to write, the fastest, and the most efficient implementation is to use the str.translate() function with list comprehensions.

  • Related HOW TO?