Python vs C#

I thought I’d post about something different for a change, just to prove I have interested outside of gardening. And since I was debating this topic with a colleague on Friday, it seemed like a good place to start.

Static vs Dynamic Typing

I should explain that I work in the digital department of a medium sized engineering company. My department is divided between people who are more focused on developing algorithms to solve interesting problems (‘modellers’) and people whose focus is building production IT systems (‘software engineers’). Obviously these two sides tend to favour different languages, thus the Python vs C# debate.

The debate at the time focused on whether static typing is a major plus in choosing a language or not. I personally don’t think that static types should be the deciding factor in choosing a language – not because I find it hard to use a statically typed language, but because my experience is that static types typically only really catch the easy errors early.

Of course, it’s useful to be immediately told if you’re trying to add a string to a number. But generally such errors are identified even without static checking as long as your testing is thorough enough to cover your code base properly. Where static types don’t help is with the hard to find errors, where you’re doing the algorithmically wrong thing with the right types, and those are the ones where you lose a lot of time. The only way to find those is to do thorough testing, which tends to catch your type errors in any case.

Of course, a better type system can make static typing slightly more useful by making the types of functions more descriptive of what they should do, and by ruling out at compile time more incorrect behaviour.

For example, in the past I’ve written code outside of work in Haskell, which has a very strong but powerful type system. In Haskell, the type of a function to get the length of a list is:

length :: [a] -> Int

Here, ‘a’ represents any type at all. The function can take a list of values of any type and return a number. The type signature also encodes that length does no IO or anything else non-deterministic.

Given that we know that ‘length’ is a deterministic function that does no IO, and that it knows nothing about the type of value containing in the list it takes as an input, and it returns a number, there’s a very limited number of things it could do, and a lot of things the compiler can object to up front.

Broadening the Argument

Of course, my debating partner didn’t agree with my argument. But thinking about it afterwards, that was the tip of the iceberg. Here’s some other criteria where I’d disagree with common practice:

  1. Benefits of OO (Object Orientation)
  2. Reference vs value programming

I think from our discussion that my colleague would disagree with me on OO, but agree with me on the importance of restricting shared references.

Object Orientation

Object orientation is probably the most popular programming paradigm around right now. And for good reason, since it directly encodes two human tendencies / common ways of thinking:

  1. Organising things into hierarchies
  2. Ascribing processes to entities

Now, I don’t deny that OO can be a helpful way to structure your thinking for some problems, although helpful isn’t the same as necessary. What I have a bigger problem with is what you might call the OO fundamentalism that’s taken over much of the field.

Depending on the language, you might find things like:

  • The inability for a function to just be a function without an owner

Do Cos and Sin really need to belong to a class rather than just a library of functions? Do you really need to use a Visitor pattern instead of just passing a first class function to another function? Does a commutative function like add really ‘belong’ to either of the things being added?

  • The inability to separate shared interfaces from inheritance

In some languages, interfaces as a concept don’t exist. In others, they do exist but are under-utilised in the standard library, meaning that in practice you’re often forced to build a subclass when all you really want is to implement a specific interface required by the function you want to call.

In Python, this issue is solved by duck typing. In at least one non-OO statically typed language (Haskell), it’s solved by type classes. In C# it’s inconsistently solved by interfaces.

  • WORSE: interfaces only by sub-classing, and single inheritance so classes can effectively only implement one ‘interface’ at a time

Who decided to make it so hard to specify the actual interfacing standards in a generic way?

  • The tendency for OO languages to encourage in-place mutation of values and reliance on identity rather than value in computation as the default, rather than as a limited performance enhancing measure

It’s now widely agreed that too much global state is a bad thing in programming. But the badness of global state is really just an extreme of the badness of directly mutating the same memory from many different locations. The more you do this without clear controls, the harder it is to debug the resulting program. And yet the most common programming paradigm around encourages, in almost all cases, in-place mutation and reference sharing as the default.

This annoys me so much I’m going to write a small section about it.

References vs Values

Let’s illustrate the problem with a simple Python example shall we? In python you can multiply a list by a number to get multiple copies of the same values, for example:

[1] * 3 = [1,1,1]

Now, let’s imagine we want a list of 3 lists:

[[1]] * 3 = [[1], [1], [1]]

Let’s say we take our list of lists and add something to the first list.

x = [[1]] * 3

x[0].append(2)

What do you think the value of x is now? Do you think it’s [[1,2], [1], [1]]? If you do then you’re wrong. In fact it’s [[1,2], [1,2], [1,2]], because all elements of the list refer to exactly the same object.

How dumb is this? And to make it even worse, like many OO languages Python has a largely hidden value vs reference type distinction, so the following does work as expected:

x = [1] * 3

x[0] += 1

You get the expected x = [2,1,1] as a result.

So you have a pervasive tendency for the language to promote object sharing and mutability, which together mean you have to be incredibly careful to explicitly copy things otherwise you end up corrupting the data other parts of your program are using. And unlike in C, where for the most part it’s clear what’s a pointer and what’s not, you also have an unmarked lack of consistency between types which do this and types where operations are by value.

Similar issues occur in most object orientated languages, creating brittle programs with vast amounts of hidden shared state for no obvious benefit in most cases. It would be better to have special syntax for the limited cases where shared state is important for performance, but that’s not the way most of the world went. And now we’re paying the price, since the reference model breaks completely in a massively-parallel world.

So – Python vs C#?

How do Python and C# stack up on all three criteria?

  • Typing

Both are strongly typed, but C# has static typing while Python doesn’t.

As I said, for me C#’s type system isn’t clever enough to catch most of the hard bugs, so I think it only earns a few correctness points while losing flexibility points.

No overall winner.

  • Object orientation

Both languages are mostly object orientated. Python is less insistent on your own code being OO than C#, and will happily let you write procedural or semi-functional code as long as you don’t mind the standard library being mostly composed of objects.

C# has interfaces as a separate concept, but for added inconsistency also uses sub-classes for shared interfaces. Python mostly does shared interfaces by duck typing, which is of course ultra-flexible but relies on thorough testing as the only way to check compliance to the required interface.

Since I don’t think OO is always the best way to structure a problem, I’d give the points to Python on this.

  • References vs values, aka hidden pointers galore

Both Python and C# have the same disturbing tendency to make you work hard to limit shared state, and promote bugs by choosing the dangerous option as the default.

Both languages lose here.

So in terms of a good experience writing code, I’m inclined to give Python the advantage, but to be honest there isn’t much in it. For me, other factors are much more important, such as availability of functionality required for a project, avalability of others in the team with the right skillset for ongoing maintenance, and very occasionally performance (Python is not fast, but this doesn’t matter for most projects).

Now if a good functional language like Haskell or Ocaml would just become common enough to solve the library and available personnel issue, we’d at least have the opposite extreme as an option (value over reference, less or no OO). Then maybe in another decade we could find a compromise somewhere in the middle…