python - Get protocol + host name from URL

ID : 20202

viewed : 15

Tags : pythondjangopython

Top 5 Answer for python - Get protocol + host name from URL

vote vote

90

You should be able to do it with urlparse (docs: python2, python3):

from urllib.parse import urlparse # from urlparse import urlparse  # Python 2 parsed_uri = urlparse('http://stackoverflow.com/questions/1234567/blah-blah-blah-blah' ) result = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri) print(result)  # gives 'http://stackoverflow.com/' 
vote vote

82

https://github.com/john-kurkowski/tldextract

This is a more verbose version of urlparse. It detects domains and subdomains for you.

From their documentation:

>>> import tldextract >>> tldextract.extract('http://forums.news.cnn.com/') ExtractResult(subdomain='forums.news', domain='cnn', suffix='com') >>> tldextract.extract('http://forums.bbc.co.uk/') # United Kingdom ExtractResult(subdomain='forums', domain='bbc', suffix='co.uk') >>> tldextract.extract('http://www.worldbank.org.kg/') # Kyrgyzstan ExtractResult(subdomain='www', domain='worldbank', suffix='org.kg') 

ExtractResult is a namedtuple, so it's simple to access the parts you want.

>>> ext = tldextract.extract('http://forums.bbc.co.uk') >>> ext.domain 'bbc' >>> '.'.join(ext[:2]) # rejoin subdomain and domain 'forums.bbc' 
vote vote

70

Python3 using urlsplit:

from urllib.parse import urlsplit url = "http://stackoverflow.com/questions/9626535/get-domain-name-from-url" base_url = "{0.scheme}://{0.netloc}/".format(urlsplit(url)) print(base_url) # http://stackoverflow.com/ 
vote vote

61

>>> import urlparse >>> url = 'http://stackoverflow.com/questions/1234567/blah-blah-blah-blah' >>> urlparse.urljoin(url, '/') 'http://stackoverflow.com/' 
vote vote

54

Pure string operations :):

>>> url = "http://stackoverflow.com/questions/9626535/get-domain-name-from-url" >>> url.split("//")[-1].split("/")[0].split('?')[0] 'stackoverflow.com' >>> url = "stackoverflow.com/questions/9626535/get-domain-name-from-url" >>> url.split("//")[-1].split("/")[0].split('?')[0] 'stackoverflow.com' >>> url = "http://foo.bar?haha/whatever" >>> url.split("//")[-1].split("/")[0].split('?')[0] 'foo.bar' 

That's all, folks.

Top 3 video Explaining python - Get protocol + host name from URL

Related QUESTION?