Monday, August 6, 2007

Weird set behavior

I have been playing with set operations lately, and came across a kind of surprising result, given that it is not mentioned in the standard Python tutorial:

with python sets, intersections and unions are supposed to be done like this:

In [7]:set('casa') & set('porca')
Out[7]:set(['a', 'c'])

In [8]:set('casa') | set('porca')
Out[8]:set(['a', 'c', 'o', 'p', 's', 'r'])

and they work correctly. Now, what is confusing, is that if you do:

In [5]:set('casa') and set('porca')
Out[5]:set(['a', 'p', 'c', 'r', 'o'])

In [6]:set('casa') or set('porca')
Out[6]:set(['a', 'c', 's'])

The results are not what you would expect from an AND or OR operation, from the mathematical point of view! apparently the "and" operation is returning the the second set, and the "or" operation is returning the first.

If python developers wanted these operations to reflect the traditional (Python) truth value for data structures: False for empty data structures and True otherwise, why not return simply True or False?

So My question is: Why has this been implemented in this way? My answer, as many readers have also pointed out , is that sets are implemented like this, so that they can be used in the "and/or trick" or ternary operator. But this is very confusing for users that are thinking about sets in a mathematical way, where "AND" means intersection and "OR" means union. I can see this confusing many newbies...

7 comments: said...

This is how most 'and' and 'or' operations work. A lot of people use this side effect for doing the terinary operator in 2.4. <condition> and <true result> or <false result>.

Basically if you 'and' two things together, and they are true, you get the last true value. If one of the two elements is false, you get that one.

Similarly with 'or', you get the first true value or the last value (because it's false).

usagi said...

Yes I know, this is also known as the and/or trick. I think that is fine for other data types, but for sets where boolean operations do have a traditional meaning (from math) I think it only confuses things up...

Anonymous said...

set overrides the & and | operator having them mean set intersection and union.
'and' and 'or' are kept as logical operators and work the same as:
['first'] or ['second'] -> ['first']
['first'] and ['second'] -> ['second']

Anonymous said...

'and' and 'or' need not convert to True or False because when the end result of these statements is used in a boolean context it will automaticaly get treated correctly in the traditional Python way. That is exactly why there's no need to actually return True or False.

Consider both your and/or examples. If you were to use either of those in an if statement, they would both evaluate to True, no? However, if you had an empty set in the 'and' statement and used that in an if statement, it would evaluate to 'false'.

In addition to to using this as a form of the ternary, the 'and' is sometimes used as a shortened form of an if statement:

expression and do_something()

will have the same effect as:

if expression: do_something()

Anonymous said...

I guess the rationale is to allow both boolean and set semantics in something like:

if x and y: print 'both sets have data'
if x&y: print 'both sets have common letters'

Floris Bruynooghe said...

Think of it as a container just like a list. It's True if it's non-empty, and that's logical:

if set() or do_it_anyway_flag:

jenan said...

In addition to what everyone else has said, it's worth pointing that out that the "and" and "or" operators *cannot* be overwridden directly like the & and | operators can. a & b calls a.__and__(b), and if that doesn't work out calls b.__rand__(a), I believe. A similar situation works for __or__. But there is no such special method for "and" or "or".

The only way to change the behavior of these is to change the __nonzero__ or __len__ methods, which are obviously more general-purpose.