Python - How To Ways to Remove xa0 From a String in Python

ID : 58

viewed : 132

Tags : PythonPython String

vote vote

98

This article introduces different methods to remove \xa0 from a string in Python.

The \xa0 Unicode represents a hard space or a no-break space in a program. It is represented as   in HTML.

The Python functions that can help to remove \xa0 from a string are as follows.

  • The normalize() function of unicodedata
  • The string’s replace() function
  • The BeautifulSoup library’s get_text() function with strip enabled as True.

Use the Unicodedata’s Normalize() Function to Remove \xa0 From a String in Python

You can use the unicodedata standard library’s normalize() function to remove \xa0 from a string.

The normalize() function is used as follows.

unicodedata.normalize("NFKD", string_to_normalize) 

Here, NFKD denotes the normal form KD. It replaces all the compatibility characters with their equivalent characters.

The example program below illustrates this.

import unicodedata  str_hard_space='17\xa0kg on 23rd\xa0June 2021' print (str_hard_space) xa=u'\xa0'  if xa in str_hard_space:     print("xa0 is Found!") else:     print("xa0 is not Found!")   new_str = unicodedata.normalize("NFKD", str_hard_space) print (new_str) if xa in new_str:     print("xa0 is Found!") else:     print("xa0 is not Found!") 

Output:

17 kg on 23rd June 2021 xa0 is Found! 17 kg on 23rd June 2021 xa0 is not Found! 

Use the String’s replace() Function to Remove \xa0 From a String in Python

You can use the string’s replace() function to remove \xa0 from a string.

The replace() function is used as follows.

str_hard_space.replace(u'\xa0', u' ') 

The below example illustrates this.

str_hard_space='16\xa0kg on 24th\xa0June 2021' print (str_hard_space) xa=u'\xa0'  if xa in str_hard_space:     print("xa0 Found!") else:     print("xa0 not Found!")  new_str = str_hard_space.replace(u'\xa0', u' ') print (new_str) if xa in new_str:     print("xa0 Found!") else:     print("xa0 not Found!") 

Output:

16 kg on 24th June 2021 xa0 Found! 16 kg on 24th June 2021 xa0 not Found! 

Use the BeautifulSoup Library’s get_text() Function With strip Set as True to Remove \xa0 From a String in Python

You can use the BeautifulSoup standard library’s get_text() function with strip enabled as True to remove \xa0 from a string.

The get_text() function is used as follows.

clean_html = BeautifulSoup(input_html, "lxml").get_text(strip=True) 

The below example illustrates this.

from bs4 import BeautifulSoup html = 'This is a test message, Hello This is a test message, Hello\xa0here' print (html)  clean_text = BeautifulSoup(html, "lxml").get_text(strip=True)  print(clean_text) 

Output:

Hello, This is a test message, Welcome to this website! Hello, This is a test message, Welcome to this website! 

  • Related HOW TO?