Why is computing with media so rarely supported in programming languages?

June 17, 2011 at 8:44 am 27 comments

Our publisher has asked Barb and me to explore making a 3rd edition of our Python Media Computation book, and in particular, they would like us to talk about and use Python 3.0 features.  Our book isn’t a generic Python book — we can only use a language with our Media Computation approach if we can manipulate the pixels in the images and the samples in the recorded sounds.  Can I do that in Python 3.0?

The trick of our Java and Python books is that we can manipulate pixels and samples in Java.  I wrote the original libraries, which did work — but then Barbara saw my code, eventually stopped laughing, and re-wrote them as a professional programmer would.  Our Python Media Computation book doesn’t use normal C-based Python.  We use Jython, a Python interpreter written in Java, so that we could use those same classes.  We solved the problem of accessing pixels and samples only once, but used it with two languages.  We can’t use that approach for the Python 3.0 request, because Jython is several versions behind in compatibility with CPython — Jython is only at Python 2.5 right now, and there won’t be Jython 3.0 for some time yet.

We used our Java-only media solution because it was just so hard to access pixels and samples in Python, especially in a cross-platform manner.  Very few multimedia libraries support lower levels of access — even in other languages.  Sure, we can play sounds and show pictures, but changing sounds and pictures is much more rare.  I know how to do it in Squeak (where it’s easy and fast), and I’ve seen it done in C (particularly in Jennifer Burg’s work).

I have so-far struck out in finding any way to manipulate pixels and samples in CPython.  (I don’t have the cycles to build my own cross-platform C libraries and link them into CPython.)  My biggest disappointment is Pygame, which I tried to use last summer.  The API documentation suggests that everything is there!  It just doesn’t work.  Pixels work fine in Pygame.  Every sound I opened with Pygame reported a sampling rate of 44100, even if I knew it wasn’t.  The exact same code manipulating sounds worked differently on Mac and Windows.  I just checked, and Pygame hasn’t come out with a new version since 2009, so the bugs I found last summer are probably still there.

What I don’t get is why libraries don’t support this level of manipulation as a given, simply obvious.  Manipulating pixels and samples is fun and easy — we’ve shown that it’s a CS1-level activity.  If the facilities are available to play sounds and show pictures, then the pixel and samples are already there — in memory, somewhere.  Just provide access!  Why is computing with media so rarely supported in programming languages?  Why don’t computer scientists argue for more than just playing and showing from our libraries?  Are there other languages where it’s better?  I have a book on multimedia in Haskell, but it doesn’t do pixels and samples either.  I heard Donald Knuth once say that the hallmark of a computer scientist is that we shift our work across levels of abstractions, all the way down to bytes when necessary.  Don’t we want that for media, too?

So, no, I still have no idea how to do media computation with Python 3.0.  If anyone has a suggestion of where to look, I’d appreciate it!

Entry filed under: Uncategorized. Tags: , , .

A brain training exercise that really does work — for 3 months Using perceptual/pre-cognitive knowledge for better learning — of programming?

27 Comments Add your own

  • 1. Ben Bederson  |  June 17, 2011 at 8:51 am

    How about using the web for your renderer? Not quite ideal, but you can use native Python 2.5 (or 3.0)’s built-in web server and generate .png’s – which you can display in a browser. And more interestingly, use a browser Canvas to display full 2D animated graphics.

    Slightly, but not much more complicated to deploy and solves a lot of modern problems – plus enables mashups with other web sources, etc. – which I’d argue today is just as important as media for motivation.

    Contact me if you’d like to discuss: @bederson

    • 2. Mark Guzdial  |  June 17, 2011 at 8:55 am

      Really interesting solution, Ben — thanks! But that’s only for pixels. What about sounds? Browsers can’t render sounds from samples, as far as I can tell. I have to be able to do both pixels in pictures and samples in sounds. Like in my Pygame example, I find it easier to solve the pixel problem than a sample problem.

  • 3. pythoneer  |  June 17, 2011 at 9:13 am

    It does feel a bit like Python has become *less* media-friendly in recent years.

    If numpy was installed alongside Pygame, perhaps you could use Sndarray to manipulate samples? Even if this works, I guess that a simplified API would still need to be implemented on top of this for a CS1 audience.

    Pygame is being rewritten, but I suspect version 2.0 (‘pgreloaded’) won’t address (m)any of the issues you describe.

    Personally, I’m wondering about switching to pyglet, which feels a bit cleaner than Pygame and reputedly now works with Python 3. Unfortunately, the docs suggest that audio support is limited to playback, with no access to audio samples – so I guess it wouldn’t solve your problems!

  • 4. Darrin Thompson  |  June 17, 2011 at 9:59 am

    Most of those media libraries are for making games and general purpose applications. It’s not a matter of how “easy” it or isn’t. It’s a matter of what the library authors want to support. Usually playing with pixels and samples is not part of the abstraction they are trying to create.

    Even with game engines, it’s all pipelines, shaders, spriates, and maybe a little audio dsp. I doubt most of that is CS1 material, but maybe I’m wrong about that.

    For instance, SFML is really popular for game design right now. And here’s a sample application:


    So while your gripe is reasonable, people trying to get work done aren’t coding pixel and sample manipulation. They don’t demand it.

  • 5. Shrikrishna Shrin  |  June 17, 2011 at 11:15 am

    I’d like to understand more about what you mean by “render sounds from samples” but when it comes to manipulating pixels, I feel the following course at Stanford is a great direction:


    • 6. Mark Guzdial  |  June 17, 2011 at 12:49 pm

      Recorded sounds are stored as an array of pressure samples taken at a specific rate. Each sample is 16 bits long (typically). CD quality sound is 44,100 samples per second. In our media computation classes, we teach array manipulation by having students splice sounds, reverse sounds, and do volume and pitch manipulations. Being able to go from sample data, into a manipulable array, then back into a playable sound is what I’m referring to as “rendering.” By generating samples as sine waves, we can also “render” musical sounds through various synthesis techniques. It’s computationally easy, with lots of room for creative flexibility. But support for that level of audio manipulations almost non-existent in languages and libraries. (It was do-able but not easy in Java — samples are available as a long byte array, and the parsing into individual samples and left/right stereo channels was all up to us.)

  • 7. Steve Thomas  |  June 17, 2011 at 12:47 pm

    Why not do a version in Etoys, we could then make it available through OLPC.
    I re-rendered part of your Chapter 1 slides in Etoys http://www.squeakland.org/showcase/project.jsp?id=11011

    There is also an Etoys way to do the image manipulation, Karl Ramberg did a verison Color Reading And Writing (http://www.squeakland.org/showcase/project.jsp?id=7044) .

    Modifying sound would be a challenge and they would have to get into Squeak or someone would need to develop new scripting tiles.

    • 8. Mark Guzdial  |  June 17, 2011 at 12:51 pm

      Absolutely, Steve — this is easy to do. In fact, Alan had a great etoys demo of rendering a sample array that was one of my inspirations for Media Computation. Squeak can do anything. I’m whining about everyone else NOT supporting this.

  • 9. gasstationwithoutpumps  |  June 18, 2011 at 12:05 am

    Creating a .wav file from an array of samples is only a few lines of code, and results in an OS-independent way of playing sounds, if you don’t mind a huge delay. Playing sounds in real time tends to be very OS-dependent. I’ve done it a few times and the solutions generally only last a couple of years. Not worth it to do it again.

    • 10. Mark Guzdial  |  June 18, 2011 at 12:35 pm

      Could you send me a pointer on how to do that, please? I tried once to generate a WAV file from the samples, reverse-engineering from the Squeak code, but never got it working. We never do anything real-time. Hard disks are fast enough to write out a WAV file, then open it and play it.

      We’ve had some trouble with incompatible versions of WAV files. For awhile, Windows Media Player couldn’t play WAV files generated from Squeak or Java. If I read the WAV file into another tool (like QuickTime) then saved it out, Windows Media Player could read it. The latter versions of Windows Media Player are more flexible, but I never figured out what was the problem.

      • 11. gasstationwithoutpumps  |  June 18, 2011 at 6:37 pm

        Here are a .h and .c file that worked for me on my Mac:

        /* make_wav.h
        * Fri Jun 18 17:06:02 PDT 2010 Kevin Karplus

        #ifndef MAKE_WAV_H
        #define MAKE_WAV_H

        void write_wav(char * filename, unsigned long num_samples, short int * data, int s_rate);
        /* open a file named filename, write signed 16-bit values as a
        monoaural WAV file at the specified sampling rate
        and close the file


        /* make_wav.c
        * Creates a WAV file from an array of ints.
        * Output is monophonic, signed 16-bit samples
        * Fri Jun 18 16:36:23 PDT 2010 Kevin Karplus


        #include "make_wav.h"

        void write_little_endian(unsigned int word, int num_bytes, FILE *wav_file)
        unsigned buf;
        { buf = word & 0xff;
        fwrite(&buf, 1,1, wav_file);
        word >>= 8;

        /* information about the WAV file format from

        void write_wav(char * filename, unsigned long num_samples, short int * data, int s_rate)
        FILE* wav_file;
        unsigned int sample_rate;
        unsigned int num_channels;
        unsigned int bytes_per_sample;
        unsigned int byte_rate;
        unsigned long i; /* counter for samples */

        num_channels = 1; /* monoaural */
        bytes_per_sample = 2;

        if (s_rate<=0) sample_rate = 44100;
        else sample_rate = (unsigned int) s_rate;

        byte_rate = sample_rate*num_channels*bytes_per_sample;

        wav_file = fopen(filename, "w");
        assert(wav_file); /* make sure it opened */

        /* write RIFF header */
        fwrite("RIFF", 1, 4, wav_file);
        write_little_endian(36 + bytes_per_sample* num_samples*num_channels, 4, wav_file);
        fwrite("WAVE", 1, 4, wav_file);

        /* write fmt subchunk */
        fwrite("fmt ", 1, 4, wav_file);
        write_little_endian(16, 4, wav_file); /* SubChunk1Size is 16 */
        write_little_endian(1, 2, wav_file); /* PCM is format 1 */
        write_little_endian(num_channels, 2, wav_file);
        write_little_endian(sample_rate, 4, wav_file);
        write_little_endian(byte_rate, 4, wav_file);
        write_little_endian(num_channels*bytes_per_sample, 2, wav_file); /* block align */
        write_little_endian(8*bytes_per_sample, 2, wav_file); /* bits/sample */

        /* write data subchunk */
        fwrite("data", 1, 4, wav_file);
        write_little_endian(bytes_per_sample* num_samples*num_channels, 4, wav_file);
        for (i=0; i< num_samples; i++)
        { write_little_endian((unsigned int)(data[i]),bytes_per_sample, wav_file);


        • 12. Mark Guzdial  |  June 20, 2011 at 9:06 am

          Wow! This is great, Kevin! Many thanks!

        • 13. gasstationwithoutpumps  |  June 20, 2011 at 12:09 pm

          Incidentally, the code was properly indented on my machine, but the “code” tags in the comment mechanism on wordpress seem to throw away indenting.

          I’ve been looking for a way to discuss code on one of my wordpress.com blogs, but they provide no way to conveniently share code, particularly in multi-file chunks. Have you found a good way to share code for discussion on a blog?

          Right now, I’m trying to decide between sticking them on some free file-sharing service (which one? and will it still exist a year from now?) and
          paying for a service that supports both wordpress blogs and real web pages (where I could put up .h and .c files, and even .tar.gz files). For stuff for work, I can use my university account, but there must be a decent other solution.


          • 14. Mark Guzdial  |  June 20, 2011 at 1:20 pm

            No, I don’t know a good solution, but agree that it’s an important and interesting problem. I found some tips on putting code into WordPress. I discovered Highlight a while back, and have been looking for a good use for it. I wonder if this could be it?

          • 15. gasstationwithoutpumps  |  June 20, 2011 at 11:31 pm

            The code was more damaged by wordpress than I realized—even the #includes are messed up.

            I tried using the “pre” tag on my blog, but WordPress.com STILL messes up the code. It looks like WordPress.com is determined that no one should ever talk about computer programs on their blogs!

            I’ve put the files in http://users.soe.ucsc.edu/~karplus/Digitar/
            with some examples of their use, so that you don’t have to guess at how to undo the damage that WordPress did!

          • 16. Peter Boothe  |  June 21, 2011 at 11:13 am

            http://pastie.org for snippets and http://gist.github.com for whole files are the best ways I have found of sharing code online.

          • 17. Gilbert  |  June 26, 2011 at 9:03 pm

            Sorry for this reply going in the wrong place…

            I’ve used Highlight pretty successfully to get code on my (mostly defunct) blog. Here’s an example: http://gilbazoid.wordpress.com/2010/01/09/hello-gl-in-python/

            You can also dump some of the stylesheet stuff into your blog style sheet (it costs an annual fee to get editing access). Then you can have very nicely formatted code with some syntax highlighting, alternating line watermarks, the works.

  • 18. gasstationwithoutpumps  |  June 18, 2011 at 12:07 am

    Look at http://www.zak.co.il/a/stuff/gpl/misc/eng/pythonsound
    for a review of some of the Python sound-processing packages.

  • 19. Bert  |  June 18, 2011 at 6:54 am

    Seems like no-one wants to take a shot at answering the “why” question?

    • 20. Gilbert  |  June 26, 2011 at 9:18 pm

      So, here’s one possible answer to why question: performance. I’m going to give a quick example from the evolution of the OpenGL API, and then I should be able to explain the reasoning.

      In earlier versions of OpenGL, if all you wanted to do is draw a triangle, it was relatively simple and straightforward:

      glBegin(GL_TRIANGLES); glVertex3f(x0,y0,z0); glVertex3f(x1,y1,z1); glVertex3f(x2,y2,z2); glEnd();

      (where (x0,y0,z0) is the first point, etc.)

      That kind of interface has since been deprecated in favor of an approach where you have to declare a full list of all of your vertices and then hand that over to the API. (this is especially the case for the mobile device versions of the OpenGL spec.) One side-effect of this change is that it’s now more awkward to modify individual vertex positions, since those modifications won’t take effect until you’ve “re-loaded” the vertex buffer.

      Ok, explanation:
      The reason for this is that the vertex data has to be shipped over the memory bus from main RAM to the GPU RAM. Pushing all of the graphics data over the bus every frame can quickly lead to bandwidth becoming the system bottleneck. Additionally, since the card has often started rendering the next frame before the last one is complete/displayed, “re-loading” the vertex data can introduce extra non-determinism/dependencies that prevent GPU-side optimizations. On phones/mobile devices, this performance loss isn’t just a matter of speed, but also of battery consumption.

      While the above is very specific to 3d graphics APIs, the general reasoning is probably true for most media on computers. Sound and 2d graphics also (a) require large amounts of data, (b) tend to have specialized hardware support and (c) have real-time constraints.

      • 21. gasstationwithoutpumps  |  June 28, 2011 at 1:37 am

        First-year programming students do not need high performance—look at how popular Scratch is despite massive inefficiencies. You can’t use the need for high performance in top-quality applications as a reason for not teaching media computation to beginning programmers.

        If your reasoning was that the people who’ve been working on media applications have gotten so tied up in optimizing their own code that they’ve neglected providing an easy entry for new programmers, then I might agree with you.

  • 22. Doug Blank  |  June 18, 2011 at 9:05 am

    Mark, I’ve been beating my head against a wall the last month trying to answer this question myself. My goal: play a single tone, on Windows, Linux, and Mac. This isn’t a language issue, per se, but a library issue. Whose job is it to keep a audio library working on all three major platforms? The language people? The problem is that you need to know a lot about the low-level details on all of the platforms.

    My solution? I believe that educators need to become part of the development team for the underlying libraries. This might only be as a person who joins the community of a project (like PyGame) and highlights those bugs the we are really interested in. Or it might require that we actually help write and maintain these libraries.

    In any event, it looks like we will have a cross-platform solution for Python (and other languages) using the same underlying library as PyGame (SDL). It will require some expenditure at this lower-level, but we gain complete control of the entire “stack” of software, from language to audio wave and pixel. This has a huge payoff, too: we can do things that were previously impossible (such as writing a wave function in Python, and passing that as the sound generator). I know Squeak has had that for years, but now so does Python, Ruby, and Scheme!

    Perhaps Calico will work for Mediacomp after all.



    • 23. Mark Guzdial  |  June 20, 2011 at 9:23 am

      I’m definitely interested in exploring Calico — thanks, Doug!

    • 24. Alan Kay  |  June 25, 2011 at 7:55 am

      As Mark said “Squeak can do ‘anything’ …” so why not see how it does all this stuff not just OS and Machine Independently, but also bit-identically?

      Then ponder when Smalltalk was originally done, and wonder how things could have gotten so much worse for no good reason of any kind.

      P.S. This is *not* an ad for Squeak — we need something much better today than the best we could do in the 70s — but we — and the students — definitely should not have to wallow in stuff that is horrendously worse.



  • 25. gasstationwithoutpumps  |  June 21, 2011 at 12:30 pm


    Thanks for “http://pastie.org for snippets and http://gist.github.com for whole files”

    The pastie looks useful for web pages, but they use the “script” tag, which most blog software does not permit for obvious security problems.

    gist.github.com may be a decent choice. They at least are used to dealing with software files and are more likely to be around in 4 years than most file-sharing sites.

  • […] Why is computing with media so rarely supported in programming languages? (computinged.wordpress.com) […]

  • […] complained about this problem in my blog in 2011 (see post here). The situation is better in other languages, but not yet in […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trackback this post  |  Subscribe to the comments via RSS Feed

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 9,052 other followers


Recent Posts

Blog Stats

  • 2,030,788 hits
June 2011

CS Teaching Tips

%d bloggers like this: