python - Why is it string.join(list) instead of list.join(string)?

ID : 467

viewed : 145

Tags : pythonstringlistpython





Top 5 Answer for python - Why is it string.join(list) instead of list.join(string)?

vote vote

99

It's because any iterable can be joined (e.g, list, tuple, dict, set), but its contents and the "joiner" must be strings.

For example:

'_'.join(['welcome', 'to', 'stack', 'overflow']) '_'.join(('welcome', 'to', 'stack', 'overflow')) 
'welcome_to_stack_overflow' 

Using something other than strings will raise the following error:

TypeError: sequence item 0: expected str instance, int found 
vote vote

83

This was discussed in the String methods... finally thread in the Python-Dev achive, and was accepted by Guido. This thread began in Jun 1999, and str.join was included in Python 1.6 which was released in Sep 2000 (and supported Unicode). Python 2.0 (supported str methods including join) was released in Oct 2000.

  • There were four options proposed in this thread:
    • str.join(seq)
    • seq.join(str)
    • seq.reduce(str)
    • join as a built-in function
  • Guido wanted to support not only lists and tuples, but all sequences/iterables.
  • seq.reduce(str) is difficult for newcomers.
  • seq.join(str) introduces unexpected dependency from sequences to str/unicode.
  • join() as a built-in function would support only specific data types. So using a built-in namespace is not good. If join() supports many datatypes, creating an optimized implementation would be difficult, if implemented using the __add__ method then it would ve O(n²).
  • The separator string (sep) should not be omitted. Explicit is better than implicit.

Here are some additional thoughts (my own, and my friend's):

  • Unicode support was coming, but it was not final. At that time UTF-8 was the most likely about to replace UCS2/4. To calculate total buffer length of UTF-8 strings it needs to know character coding rule.
  • At that time, Python had already decided on a common sequence interface rule where a user could create a sequence-like (iterable) class. But Python didn't support extending built-in types until 2.2. At that time it was difficult to provide basic iterable class (which is mentioned in another comment).

Guido's decision is recorded in a historical mail, deciding on str.join(seq):

Funny, but it does seem right! Barry, go for it...
Guido van Rossum

vote vote

76

Because the join() method is in the string class, instead of the list class?

I agree it looks funny.

See http://www.faqs.org/docs/diveintopython/odbchelper_join.html:

Historical note. When I first learned Python, I expected join to be a method of a list, which would take the delimiter as an argument. Lots of people feel the same way, and there’s a story behind the join method. Prior to Python 1.6, strings didn’t have all these useful methods. There was a separate string module which contained all the string functions; each function took a string as its first argument. The functions were deemed important enough to put onto the strings themselves, which made sense for functions like lower, upper, and split. But many hard-core Python programmers objected to the new join method, arguing that it should be a method of the list instead, or that it shouldn’t move at all but simply stay a part of the old string module (which still has lots of useful stuff in it). I use the new join method exclusively, but you will see code written either way, and if it really bothers you, you can use the old string.join function instead.

--- Mark Pilgrim, Dive into Python

vote vote

64

I agree that it's counterintuitive at first, but there's a good reason. Join can't be a method of a list because:

  • it must work for different iterables too (tuples, generators, etc.)
  • it must have different behavior between different types of strings.

There are actually two join methods (Python 3.0):

>>> b"".join <built-in method join of bytes object at 0x00A46800> >>> "".join <built-in method join of str object at 0x00A28D40> 

If join was a method of a list, then it would have to inspect its arguments to decide which one of them to call. And you can't join byte and str together, so the way they have it now makes sense.

vote vote

53

Why is it string.join(list) instead of list.join(string)?

This is because join is a "string" method! It creates a string from any iterable. If we stuck the method on lists, what about when we have iterables that aren't lists?

What if you have a tuple of strings? If this were a list method, you would have to cast every such iterator of strings as a list before you could join the elements into a single string! For example:

some_strings = ('foo', 'bar', 'baz') 

Let's roll our own list join method:

class OurList(list):      def join(self, s):         return s.join(self) 

And to use it, note that we have to first create a list from each iterable to join the strings in that iterable, wasting both memory and processing power:

>>> l = OurList(some_strings) # step 1, create our list >>> l.join(', ') # step 2, use our list join method! 'foo, bar, baz' 

So we see we have to add an extra step to use our list method, instead of just using the builtin string method:

>>> ' | '.join(some_strings) # a single step! 'foo | bar | baz' 

Performance Caveat for Generators

The algorithm Python uses to create the final string with str.join actually has to pass over the iterable twice, so if you provide it a generator expression, it has to materialize it into a list first before it can create the final string.

Thus, while passing around generators is usually better than list comprehensions, str.join is an exception:

>>> import timeit >>> min(timeit.repeat(lambda: ''.join(str(i) for i in range(10) if i))) 3.839168446022086 >>> min(timeit.repeat(lambda: ''.join([str(i) for i in range(10) if i]))) 3.339879313018173 

Nevertheless, the str.join operation is still semantically a "string" operation, so it still makes sense to have it on the str object than on miscellaneous iterables.

Top 3 video Explaining python - Why is it string.join(list) instead of list.join(string)?







Related QUESTION?