How to Build a Raspberry Pi Photobooth

This post describes how to make a simple photobooth (for a wedding or other event) using a Raspberry Pi, a DSLR camera and some simple python scripts.

The Photobooth

How to build a Raspberry Pi Photobooth

I got married earlier this year. As part of the ‘big day’ I wanted to have a photobooth to try and capture some images of our guests away from the glare of the Official Wedding Photographer© and from the constant selfies of smartphones.

Hiring a photobooth is a) expensive, b) sure to give you the same “wedding photobooth photos” as everyone else, and c) no fun. So I decided to make one myself instead :)

The Basic Idea

The guests would sit in an armchair and press a button. The photobooth would then take four photos and arrange them in a strip.

What you will need to build a photobooth

The bill of materials for the project included:

  • Raspberry Pi Model B  (not the latest one but I had one handy and it fit the break-out board below)
  • Raspberry Pi GPIO break-out board
  • Digital camera (I used my Nikon D7000)
  • Lomography 55mm wide-angle lens
  • A DSLR Lomo lens adaptor
  • An old camera tripod
  • Old, 1970s Betamax recorder remote control
  • 8-pin DIN socket
  • 6x 10mm super-bright, white LEDs
  • Perfboard, wire, various resistors
  • Cheap glass picture frame
  • Coloured cellophane, tin foil, thick cardboard and some duct tape
  • Some kind of case to put it all in (I used an old wine box I bought on eBay)

I wanted the photos to have a ‘vintage’ look without just ‘Insta-filtering’ them so I used a lo-fi lomography lens on my digital camera. This isn’t essential though and a ‘proper’ Nikon lens would produce sharper images.

You don’t have to use a Nikon D7000 either, you just need a digital camera that’s compatible with the gphoto2 library.

Setting up the Raspberry Pi

To set up the Pi so that it could fire the DSLR, I used the ‘Step 2’ section of this excellent guide from a similar project.

My photobooth was slightly simpler in that it didn’t have any way to instantly reproduce the shots once they had been taken. Similar projects can have a printer installed as part of the set-up. I wanted to keep this project as simple (and cheap) as possible though. Having no way of seeing what the photos looked like also had the effect of making people less self-conscious in front of the camera.

The python script

The logic-flow of the python script is simple:

1) Use a ‘while True’ loop to wait for the user to press the button

2) When a button press is detected, take four photos with the digital camera

3) Crop and re-size the four images, convert them to grayscale and join them together in a classic ‘photobooth style’

The code is available in full on my github page. In the meantime, below are the two key scripts.

The main script:

The script to manipulate the images once taken:

The woodwork

This project involved more wood cutting than my usual projects. The case of the photoboth was a wine box that I pick up cheap on eBay. The camera was held in place by the top of an old camera tripod that I screwed to some 2″x4″ that I had lying around. I used a circular drill bit to cut the holes for the camera lens as well as the holes for the indicator lights.
The tripod bracket

The lighting

I wanted the indicator lights to have the feel of a vintage pinball machine rather than look like modern LEDs. To get this I used the glass from a cheap picture frame (which luckily only just fit the inside of the box). This was fixed in place with some strong tape.

To get the ‘vintage pinball machine’ look I used some cellophane (for colour), greaseproof paper (to diffuse the light and give it a more ‘natural’ look) and some tin-foil shaped into a cup shape to isolate each LED. The LEDs were mounted onto some thick cardboard which was mounted to the case with some strong tape.

The tin-foil cups

The picture frame

2016-01-16 13.55.42

2016-01-16 13.24.38

The remote

To trigger the script I used an old wired remote control from a 1970s betamax machine. The buttons in the remote were simple on/off contact switches so I shorted connected them directly to the 8-pin DIN output and removed any of the other components in the remote to prevent any stray connections. I picked up a cheap socket on eBay and drilled a hole in the side of the case to plug the remote into. For the python script, this page was useful in making the remote control control the Pi itself.

The breakout board

The breakout board was used to connect the remote and the LEDs to the GPIO pins on the Raspberry Pi. The pin numbers used are specified in the Reference.py file on my github site. You’ll need to update these if you use different GPIO pins on your project (there’s a handy pin map here). This guide was useful in understanding how to turn the LEDs on and off using the GPIO pins and Python.

Things I’d Do Differently

The photobooth was a great success and the photos from the photobooth are among my favourite from the day. There are a couple of things I’d do differently next time around:

Better black and white photos

The way that the script converts the images to grayscale was always designed to be ‘quick and dirty’. There are dozens of online tutorials on how to create good-looking black and white images from colour shots. The script would benefit from having a more sophisticated technique for converting the images to black-and-white.

Find some way to share the images

Also, while a printer would have been expensive (both to buy and supply with ink and paper) it could have been fun to auto-upload the photo strips to Flickr or some other photo-hosting site so that people could view and share them that way. The downside of this approach is that it would have encouraged people to look at their smartphones and facebook while they were at a party which can be pretty anti-social.

 

How to photograph the curvature of the Earth

This post describes how to photograph the curvature of the Earth using a space balloon (also called a ‘high altitude weather balloon, or ‘HAB’), some Canon cameras running CHDK and a lot of internet know-how.

e3SpaceProgram

A few months ago along with some friends and I set up the #e3SpaceProgram, a project to explore space from the east end of London.

We recently launched our first mission (“Hermes I”) and successfully recovered our payload. The journey to our first launch was a long and occasionally difficult one. The end results were absolutely worth it though. I wanted to put together a brief ‘how we did it’ guide to help anyone else thinking of making the same trip.

What you need to build and launch a Space Balloon

The expertise of others

Nothing helps more than the help and guidance of people who have already done what you’re trying to do.

In particular, anyone thinking of launching a space balloon should read this post on Dave Akerman’s site. It is comprehensive and draws on a great deal of his (extensive) personal experience.

Below are some links to more pages we continue to find informative and inspirational:

  • The Brooklyn Space Program was the initial inspiration for the e3SpaceProgram.
  • The Global Space Balloon Challenge was a catalyst for getting the project over the line and a great way to connect with similar projects around the world.
  • The Stratodean Project – A ‘journey into new space, from the Forrest of Dean’. Lots of useful detail on how they’ve constructed their payload.
  • The Joshing Talk project caught the attention of Richard Branson.
  • The guys at Cambridge University Space Flight certainly know their beans. Check out their website for some impressive footage of rocket launches.
  • The UKHAS site has a wealth of information on weather balloon launches. A lot of their articles are quite technical, but required reading especially if you plan to build your own GPS tracker.
  • The #highaltitude IRC channel is full of very knowledgeable people who are more than willing to help newcomers.

A playlist

Handy for playing in the car when you’re chasing your balloon, or for the long hours of getting your payload to work. You can find our Official Spotify Hermes I playlist here*.

 

*Questionable song choices, courtesy of Sarah Day.

A (reliable) GPS tracker

Getting a tracker to work and being able to listen and decode the signal reliably was the hardest part of our first launch. As you’ll need one if you ever want to see your payload again, it’s also one of the most important pieces of the puzzle.

Having first looked at building our own, in the end we decided to let the pros handle it and picked up a Pi in the Sky (‘PITS’) tracker board for the popular Raspberry Pi. It’s not cheap but it’s reliable and as novices we’d rather spend the money than risk losing our payload.

Pi in the Sky Tracker
The Pi in the Sky tracker from www.pi-in-the-sky.com

The PITS board comes with the antenna and battery clip. You’ll need some Energizer Lithium batteries to power it (this type continue to work in the very low temperatures of near space, any other type won’t).


Energizer Lithium AA batteries

The PITS board attaches to a Raspberry Pi A+ (recommended for the flight as it is more power efficient) or a B+ (much less efficient but easier to connect to the internet which you’ll need to do in order to set up the PITS). In the end, we bought one of each.

Raspberry Pi model A+
Raspberry Pi model A+

Note that the older Raspberry Pi A and B models will not work with the PITS board, the number of pins on the GPIO header (the black bit at the top of the photo above) is different.

A back-up tracker

Given its crucial role in the mission, it is always a good idea to have a back-up tracker. We used a cheap LG-E400 android phone with an app that texted you the phone’s GPS coordinates when you text it a specific keyword.

Its battery survived the cold of near space and made it back to Earth alive. During our launch we lost radio contact with Hermes I so the back-up android phone was the only reason we managed to recover our payload.

A radio receiver

Once your balloon is in the air you’ll need a way to receive the radio signals transmitted by the PITS tracker. Professional radio receivers are expensive so instead we used Software Defined Radio (SDR) which you can run on your laptop.

We used this guide on the UKHAS website. It’s a long post and it’s worth reading through it two or three times before getting started. Some of the ‘important’ steps are buried between the various screenshots, we missed them a few times so read carefully.

There is an additional guide on how to use the dl-fldigi package here (more on what dl-fldigi is below).

Hardware

You still need some equipment:

  • An antenna to receive the signal. The Watson WSM-270 is the right wavelength and gave us a good signal.
  • A (good quality) tuner dongle. We bought a cheap dongle and it failed almost immediately. We picked up a NooElec RTL-SDR dongle on Amazon – it took a couple of weeks to arrive from America but NooElec specialise in SDR dongles and it worked reliably.

  • A filtered pre-amp to boost the signal. The HABSupplies site has a selection including this one which is the correct wavelength and comes in a handy enclosure.
  • Some connecting cables to join it all together. A SMA male to SMA mail pigtail will connect your pre-amp to your SDR dongle. You will also need an adaptor to make it fit the smaller socket on the dongle itself. Finally a SMA to BNC adaptor to connect the pre-amp to the antenna.
The e3SpaceProgram SDR dongle
The e3SpaceProgram SDR set-up. The black SDR dongle on the right was the cheap one that broke. We used it instead to provide 5v power to the pre-amp via the USB plug.

Software

You will need to install the following pieces of software. They are all for Windows and all free of charge (these are the packages mentioned in the guides above).

Some cameras to capture the journey

For our first payload, we used two Canon A560 cameras that we picked up on eBay, both powered with Energizer Lithium AA batteries (again, because they work in low temperatures).

(A lot of people use GoPro cameras which are well built and reliable. The heavy fish eye distortion of the lens isn’t to everyone’s taste though and they can be expensive, especially as you may never see it again).

Canon A560
Canon A560

The A560s were running the Canon Hackers Development Kit (CHDK) firmware. This is a non-permanent, non-destructive program that runs on the camera itself, giving you a wide range of user options not normally available. We installed CHDK on 4GB SD cards using the Simple Tool for Installing CHDK (‘STICK’).

A key feature of CHDK is the ability to load and run scripts. We used the excellent KAP UAV Exposure Control Script from the CHDK site. It is designed with this type of activity in mind and works well on the Canon A560.

With this type of firmware hack, different cameras respond in different ways and each have their own little issues. We posted whatever small bugs we found on the dedicated CHDK forum page for the script and always got a helpful response.

The camera settings

We wanted to take as many high-quality photos as possible during the flight. Your own experimentation is important but in terms of our first steps, to configure the A560s:

  1. Set the wheel on the top to AUTO (makes sure the CHDK script behaves responsibly)
  2. To save battery, in the standard (non-CHDK) shooting menu:
    • Red-Eye: off
    • AF-assist Beam: off
    • Review: off
  3. Turn off the camera flash (this is important and you’ll need to do it manually every time you turn the camera on)

Configuration of the CHDK script on some of the key attributes:

‘Side’ camera

e3SpaceProgram - Side Camera

Shot interval: 10 seconds
Timeout: 0
Total Shots: 0 (infinite)
Tv Min: 1/60
Target TV: 1/200*
Tv Max: 1/2000
Zoom position: 0%
Focus @ Infinity: True
Video Interleave (shots): 10
Video Duration (sec): 10
Backlight Off?: True

The image quality was set as Large/Superfine (the highest quality available)

*The payload twists and moves around quite a bit during the flight. A fast target shutter speed is essential to ensuring your photos look sharp.

‘Up’ camera

IMG_9449

Shot interval: 2 seconds
Timeout: 0
Total Shots: 0 (infinite)
Tv Min: 1/60
Target TV: 1/200
Tv Max: 1/2000
Zoom position: 0%
Focus @ Infinity: False**
Video Interleave (shots): Off
Backlight Off?: True

The image quality was set to Medium 2/Fine***

**I set the focus length using the Override Subject Difference setting in the Enhanced Photo Operations CHDK menu to the total length of the cord between the payload and the balloon.

***A lower quality than the side camera. The reason for this is I wanted to capture the moment the balloon burst so set the shot interval to be as low as possible. More shots = more memory so I needed to make each image smaller. The first shot of the balloon bursting was worth it though (top of the post).

The payload itself

The payload for Hermes I was a 2.7L Polystyrene Box we picked up from the Random Aerospace site. It’s important that nothing moves around inside the payload as the launch and balloon burst can be pretty violent events (the ‘rule of thumb’ is that you should be able to drop your payload down some stairs).

We created partitions using 25mm Styrofoam sheets to make sure everything stayed put during the mission.

2015-05-23 12.48.02
2015-05-23 14.24.28
2015-05-23 15.55.06
2015-05-23 17.00.25
2015-05-23 19.43.52

Miscellaneous items

Various small items that turned out to be essential included:

  • A black sharpie marker
  • Gaffer tapes (lots of it)
  • Cable ties (lots of them)
  • A Stanley knife (very sharp so please be careful and get adult supervision if you need it)
  • A safety ruler (to help with the cutting)
  • Thick cardboard (we used cardboard boxes)
  • Bright orange card/paper (to make the payload easy to spot in a field)
  • Drinking straws (used when making the antenna out of the cable you get with the PITS for example)
  • A cheap pair of black socks (for the camera holes in the payload to minimise glare from the payload)

A space balloon, a parachute and some nylon chord

Because the size and type of balloon and parachute you need will depend on the weight of your payload, it’s actually best to buy them last. As with the payload, we got ours from Random Aerospace whose site also includes a handy calculator for determining what size you need. Finally, you’ll need some nylon cord to attach everything together.

Things we’d do differently next time

1) Take your time on launch day

We ended up rushing around on the launch day more than I would have liked (as this tends to be how mistakes are made). We forgot initially to pack the back-up tracker, when we realised we re-packed the payload but in our haste, possibly broke the primary tracker.

Lesson: Make a list of specific tasks to do, take your time following it, periodically take a ‘step back’ and reflect on whether you’re overlooking anything.

2) Trigger the camera scripts externally

The cameras were packed tightly into the payload so we had to ‘guesstimate’ a delay time on the CHDK script. The ‘hurry up and wait’ nature of the launch meant my concerns about battery life and the capacity of the SD cards caused me to rush (see point 1).

Being able to trigger the scripts once everything was packed and ready for launch would have mitigated this risk.

Lesson: Safely recovering your payload is more important than how many photos you take.

3) When in doubt, buy the bigger pack

Whenever faced with the decision of what size of something to buy for your first space balloon, buy a size bigger than you think you need. This applies particularly to gaffer tape and cable ties.

Lesson: No roll of gaffer tape is ‘too big’ for space exploration.

The results

The curvature of the Earth :)

e3SpaceProgram - Taken from Hermes I on 24/05/2015
e3SpaceProgram – Taken from Hermes I on 24/05/2015

Go for launch?

Hopefully the above guide helps you get started on your first mission. Please feel to get in touch if you have any questions – we’re more than happy to help anyone take their first steps into space :)

@AdamDynamic

@e3SpaceProgram

 

War In Pieces: Leo Tolstoy’s classic, 140 characters at a time.

This post describes my project to recite Leo Tolstoy’s classic War And Peace, 140 characters at a time using twitter. You can read along at @_War_And_Peace.

Leo Tolstoy, author of War And Peace

War in Pieces

A while back I started thinking about a new twitter project having recently made a twitter bot and some infographics that I enjoyed doing. Specifically, I thought about what the ‘opposite of Twitter’ would be – an antithesis to the 140 character format that has made the Twitter so popular.

What came to mind was the modern metaphor for anything that lacks brevity: Leo Tolstoy’s ‘War and Peace’. At about 580,000 words (depending on the translation) it’s to the length of your manager’s powerpoint on ‘Inter-Departmental Performance Measurement Metrics‘ what a Double-Decker Bus is to the height of Nelson’s Column – a catch-all unit of measurement, universally understood as being “very, very long”.

So, naturally I thought it would be fun to tweet it, line by line.

War and Peace e-Book

To chop it up into 140 character chunks, I first needed an electronic copy of the text. Project Gutenberg is a digital library of books in the public domain, run and maintained by volunteers. They’ve been producing e-versions of copyright-free works since the 1970s when they used to digitise books by typing them manually (the Declaration of Independence was their first) – their *.txt version of War and Peace meant that I didn’t have to do the same.

140 Characters at a time

To make the feed as readable as possible I wanted to be careful how I broke the original text into segments, avoiding dividing paragraphs mid-word or mid-sentence. There are also special cases that I wanted to catch, such as book-ending with quotation marks segments of character’s speeches (where the speech is divided across several tweets).

To divide the text I used a  hierarchy of steps. At each step, the paragraph is divided into segments and then tested to see whether it was longer than 140 characters. If it was, the next method in the list was applied to divide that segment into smaller segments (the ones that are fine are left) and so on.

The order of the hierarchy follows some grammatical rules but was mainly based on what I thought would work best based on the way the book is written.

The hierarchy went as follows:

1) Full-stops, question-marks, exclamation marks etc

Dividing a sentence into segments (or “tokens” as they are referred to in natural language processing) is harder than it sounds – simply splitting paragraphs using full-stops doesn’t work when the paragraph contains words like “St. Petersburg” (which, being about Russia,  the paragraphs in War and Peace do. A lot.)

The Natural Language Processing Toolkit is a python library designed to do the heavy lifting in cases like this and is excellent at capturing these corner cases. Because the library is much more sophisticated than anything I would be able to design, this was used as the first method for dividing up the text.

2) Other Punctuation

Where the NLTK failed to break the text into a segment of less than 140 characters in length, the sentence was split by (in order) semi-colons, colons, hyphens, brackets and finally commas.

Because I wanted to make each tweet as long as possible (and because commas for example are, sometimes, quite, short), I ran a consolidation process afterwards to try and re-combine the segments into longer (sub-140 character) ones.

3) Doing it by hand

In about 1000 cases of the ~36,000 sentence segments that the script produced, the segment was still longer than 140 characters. These segments were usually missed because the use of punctuation wasn’t completely correct (e.g. a space was missing before an opening quotation mark) or the segment didn’t contain any punctuation at all.

Rather than fine-tune the script to capture more special cases I went through and divided the final segments by hand.

The Output

The script split the e-book (including the epilogue) into a total of 36,403 tweets (a total of 559,231 words). The story is tweeted using a Raspberry Pi at a rate of one tweet an hour (I didn’t want to do it too fast and spam followers’ timelines – it’s a really long book) and it should finish in March 2019.

You can read along by following the account on twitter at @_War_And_Peace; if you have any thoughts on the project then I’d love to hear them – message me @AdamDynamic.

The Code

Below is the code I used to divide up the text file into segments and you can download my segmented version of Project Gutenberg’s War and Peace here.

 

Sebastian Q. Canard: How to make friends and influence people

In my previous post I described how I created a cigar-loving twitter bot called Sebastian using some Python code and a Raspberry Pi. Now that Sebastian has started tweeting, it’s was time for him to start making some friends.

The fastest way to make friends is to buy them

Friends on Twitter (as in life) can be bought for the right price. In Sebastian’s case, I used the online marketplace fiverr (where people offer all kinds of questionable services for $5) to purchase 13,000 followers (these are bots like Sebastian, though with less panache). This gave Sebastian the kind of cache normally reserved for D-list celebrities, branded bathroom products and regional radio stations and (I hoped) would make new additions more likely to follow him back.

Sebastian Q Canard - before followers

Sebastian Q Canard - after followers

Follow users who like to follow back

Having bought the first 13,000 friends, I wanted Sebastian to make some real ones. There are various projects online with sophisticated network-orientated, pattern-searching algorithms for following users. Sebastian’s taste’s aren’t that complex however – all he wants to do is meet interesting people as efficiently as possible and build his list of followers by choosing people who are likely to follow back:

  • They have followers, but not too many: Real people don’t tend to have 50,000 friends and I wanted Sebastian to make friends with real people.
  • They have similar interests to Sebastian: The previous post in this series describes how Sebastian ‘creates’ his tweets by copying them from other people. The list of people he copies from forms the pool of potential users to befriend.
  • They like to follow other users: The objective is to build Sebastian’s social network, so a good potential friend is one who likes to follow others. Analysing the friends and followers list of every candidate would quickly exhaust my rate limit so instead the script looks for users with a roughly 1:1 ratio of friends vs. followers.

Add twitter followers, one at a time

The python script uses a random number generator to decide when a friend request is made. As with the tweeting, a daily profile is used so that Sebastian is more active at some times than others. A daily limit is in place to stop Sebastian making too many friends at once.

How many twitter followers is ‘too many’?

Time will tell, but for now Sebastian is busy tweeting away and should be for some time. I’ll revisit the project in a couple of months and see how successful the projects has been. In the meantime, the code is available on GitHub. If you have any questions feel free to say hello.

Sebastian Q. Canard: The cigar-loving twitter bot

This post describes how I attempted to build a social twitter bot. It runs on a Raspberry Pi and tweets automatically about his decadent life of cigars and moustaches. His name is Sebastian Q. Canard, he’s a myth and you can follow him on Twitter here.

Sebastian Q. Canard - Twitter Bot
Sebastian Q. Canard Esq.

Creating a Twitter bot

A couple of months back I found a story from the Wired 2013 conference about a guy called Kevin Ashton. Kevin built a reasonably quick-and-simple twitter bot that had managed to quickly generate a Kred score of 754 (which is quite high, apparently). I’ve been playing around with the Twitter api for a while so I thought I’d see if I could beat Kevin’s high score.

Introducing Sebastian Q. Canard

“A fascinating investigation project idea: The journeys of chess players after the 8th Chess Olympiad, Buenos Aires, 1 Sept 1939.”

@SebQCanard first tweet – 2.32 PM, 5 June 2014

I chose the name ‘Sebastian Q. Canard’ pretty much at random with the only criteria being I wanted to him to have a Victorian moustache.

The basic strategy of creating (or rather, “creating”) Sebastian’s tweets is to copy other people’s tweets and pass them off as if Sebastian had created them. The script scans twitter periodically for a list of keywords that represent Sebastian’s ‘interests’.

What these interests are I tweaked a couple of times but broadly speaking they are cigars, fine scotch and other things that interest the gentleman-rogue-about-town that I wanted Sebastian to be.

Filtering the searched tweets

In order to make the Sebastian Q. Canard a welcome addition to people’s Twitter feed, I wanted to make sure that the tweets selected were of a high calibre. I also wanted to make tweets reasonably consistent in tone.

To achieve this, all tweets returned by the search go through a pretty strict filtering process. It means that 99% of tweets get removed, but with 500 million tweets produced each day, there should still be enough to go around:

  • Banned Words: Certain words are banned and any tweet containing them are automatically excluded. These include most swear words but also terms like ‘lol’ and ‘OMG’ which seemed out of character for someone named ‘Sebastian’ (“OMG! This cigar is da bomb #Cohiba #LeatherBoundBooks”)
  • Gender-specific terms: Sebastian’s ‘character’ is male, so it might look odd if he tweeted about being pregnant
  • Links or hastags: There’s no easy way to control the nature of the links that were retweeted so rather than worry about filtering them for anything unsuitable I just banned them altogether
  • Direct tweets: So as to not spam users by repeating other people’s tweets back at them (and avoiding detection by having a user get an identical tweet from two different users)

Once the tweets have passed the filter, I clean up the tweets by capitalising the first letter, removing multiple spaces etc to generally tidy them up (Sebastian Q. Canard is someone who cares about correct punctuation).

Selecting the tweets to re-tweet

The tweets that survive the filter are ranked according to criteria designed to determine the ‘quality’ of the tweet:

  • Retweeted or favourited tweets: A good sign that a tweet is a good one is if other users have already promoted it. The number of times an eligible tweet has been retweeted is capped to prevent Sebastian accidentally retweeting a tweet that had already gone viral
  • The popularity of the user: Popular users are assumed to be popular for a reason so the algorithm favours users who have lots (but not ‘too many’) followers
  • The length and content of the tweet: Longer tweets are favoured over shorter ones (someone called ‘Sebastian’ should be able to fill 140 characters) and the percentage of words that match a list of 1,000 common words is included as a factor.

Deciding when to tweet

Like most people, I assume that Sebastian has a job (maybe as a Haberdasher) so it’s unlikely that he’s going to be tweeting around the clock. Equally, it’s going to be obvious that he’s a bot if he only tweets ‘on the hour’ or at the same time every day.

To handle this, the script runs every 19 minutes and uses a random number generator to decide whether to tweet or not. The script includes a profile that makes it more likely that Sebastian will tweet at some times (in the evenings) than others (first thing in the morning).

The next step

Sebastian is up and running and producing tweets, the next step is to have him go out into the world and try to meet people. More on that in the near future…

The Twitter bot code

The twitter bot is created using python and the python-twitter library and runs automatically on a Raspberry Pi configured as a web server. All the code for the Sebastian Q. Canard project can be found on GitHub.

Other cool Twitter bot projects

The Sebastian Q. Canard project was inspired by a few of the many projects that have done this before. Below are some links to the best ones.

  • Horse_eBooks became so big it even got it’s own wikipedia page
  • TofuBot takes tweets from a user’s timeline and uses them to reply to direct messages
  • RealBoy did some cool things with social graphs that I hope to use when Sebastian Q. Canard searches for followers
  • TrackGirl was a social Twitter bot that people started to actually care about

Panini Stickers – How many packs to complete the album?

This post uses a Monte Carlo simulation written in Python to estimate how many packs of stickers it takes to complete the Panini sticker album for the 2014 Fifa World Cup.

Football Panini Stickers
From http://worldsoccertalk.com/2012/01/10/a-trip-down-memory-lane-with-1990-panini-football-stickers-photos/

How many packs of stickers do you need to buy to complete a Panini sticker album?

Got, got, need, got got.

To a generation of once-pre-teen football fans, this was the sound of a new football season. With the 2014 FIFA World Cup comes the Panini 2014 World Cup Sticker Album and with it a new wave of nostaligic football fans reliving their sticker-swapping youth.

This time around there seems to be a fair bit of interest in the question that every parent (subconsciously) asks when shelling out pocket money: ‘how many stickers do I need to buy to complete a Panini sticker album?‘ (and more importantly, ‘how much will it cost?‘)

An article in the Independent that discusses a blog post caught my attention. In it, some probability theory is used to determine the expected number of sticker’s you’d have to buy in order to collect them all. The number of packs (and from it, the total cost) is determined by dividing the total number of stickers needed by the number of stickers you get in each pack.

The problem with this analysis and others like it is that they assume that the stickers are collected one by one and that the probability of you needing a random sticker is independent of whether you needed the previous random sticker. The stickers in each pack are guaranteed to be unique from each other however, even if they’re not unique from the stickers you’ve already got*. The statistical dependence that this feature of the sticker packs introduces means that some more esoteric probability theory is required.

It’s been a while since I’ve spent any time with probability theory (esoteric or otherwise) so I thought a reasonably-quick-and-dirty Monte Carlo simulation with a Python script would suggest a solution that accounts for this ‘sticker pack independence’, without having to fully understand the maths behind it.

* Assume you have say, 100 stickers in your album and you open a new pack of stickers. The probability that you already have the first sticker in your new pack is 100/639 (or 15.6%) as there are 100 stickers you own, and 639 stickers in total (assuming the distribution is completely random), Given you have the first sticker, the probability that you also have the second is now 99/638 (or 15.5%). This is because you know that this sticker is different from the first one you tested (the stickers in each pack are unique from each other, and you know you already have it in the album) so you can remove that sticker from both sides of the fraction. The difference doesn’t seem big (only 0.1% in this example), but it’s large enough to create an error in your calculations if you assume it isn’t there.

The Assumptions

The Monte Carlo simulation makes the following assumptions;

  • There are 639 stickers in the album
  • You get 6 (unique) stickers free when you buy the album
  • Each purchased pack contains 5 stickers
  • The stickers in each pack are assumed to be unique from each other
  • The distribution of stickers is assumed to be even, with no teams or players more popular than any other*

As Panini allow you to buy the last 50 stickers (and it’s assumed that this is cheaper than trying to find the last 50 by buying packs) the objective is to find the number of packs needed to collect the first 639 – 50 = 589 stickers. The first 6 of these are free with the album so we’re interested in how long it takes to collect the next 583.

* Anyone who collected stickers as a kid and found the same faces peering out of every new pack would dispute this (for me: Dion Dublin, Man Utd, ’92-’93 Merlin Premier League sticker album. Every. Single. Time.); these researchers seem to suggest that the assumption is accurate however.

The Monte Carlo Simulation

The simulation works as follows:

  1. A new album is created pre-populated with 6 stickers (in the script, stickers in the album are represented by the numbers 0 to 638)
  2. A random ‘pack’ of 5 stickers is generated (e.g. [606, 124, 185, 499, 318])
  3. The script checks the new pack, ignoring any numbers already in the album (“got”) and adding to the album any that are missing (“need”)
  4. The process of generating packs and adding missing numbers is repeated until the number of unique stickers in the album is greater than 610
  5. Once this is done, the total number of packs opened is recorded and the process starts again

I repeated the process 250,000 times, a graph showing the number of packs required each time is below:

The Results

The mean number of packs needed (based on a simple weighted average) to complete the first 589 stickers is 292 packs (well, 291.7898 to be more precise, but we assume that you can’t buy 0.7898 of a pack).

 The Cost

In order to calculate the cost of buying the packs we consider the following:

  • The album itself costs £2.99
  • The album comes with 6 free stickers and 3 free packs of stickers
  • Each additional pack of 5 stickers costs £0.50, or you can buy multi-packs of 30 stickers (6 packs) for £2.75
  • Stickers direct from Panini cost £0.25 each plus postage & handling of £1
  • Assume that collectors will buy the packs in the most efficient way*

Based on this, the cheapest way to buy the stickers is:

  • 1 x album (includes 6 free stickers) = £2.99
  • 3 free packs (included with the album)
  • 48 multi-packs x £2.75 = £132.00
  • 1 single pack x £0.50 = £0.50
  • 50 stickers from Panini x £0.25 = £12.50
  • Postage & handling for the 50 stickers = £1
  • Total: £148.99

* There’s some data-snooping here as it assumes that the collector will know in advance that they will collect the 589th sticker in the next 5 packs, otherwise it’s possible that they would buy the multi-pack instead. This type of ‘probabilistic decision making with incomplete information’ is beyond the scope of this analysis but might be fun for a future project.

Injury Time

To wrap up, it’s important to note that buying 292 packs doesn’t ‘guarantee’ that you’ll complete the album; 292 is merely the number where completing the album becomes ‘more likely than not’. In fact, as each pack is random no number of packs guarantees success, so it’s possible you’d have gaps in your album and find nothing but swaps in each new pack.

Speaking of swaps, the simulation ignores the ability to swap stickers with others (got, got, need, got) which I would expect to significantly reduce the number of packs needed – though by how much I expect would depend on the number of friends you had and whether they were willing to swap Lionel Messi for your Sokratis Papastathopoulous duplicate on a one-for-one basis (from experience, not always guaranteed).

The Python code I used is below, any comments in the meantime feel free to say hello on Twitter.

 The Code

#TwitterMetrics: Happiness Heat Map :)

The Twitter Happiness Heat Map attempts to find the happiest place on earth by searching Twitter for the ‘smiley face emoticon’ – :) – and creating a heatmap of where the returned tweets originated.

 

The Data

Getting the data out of Twitter uses the Twitter API, accessed using the python-twitter library. It’s cobbled together using some MySQL databases, formatted into JSON files and displayed using the d3.js graphical library.

The python script runs on a Raspberry Pi and searches Twitter for the :) emoticon every 5 minutes, returning the first 200 results each time.

In order to plot the geographic position the twitter user must have  geographic location activated for their account – only ~2% of users seem to have this so a lot of the tweets are just discarded. In addition, a user’s location is set to where their account is registered rather than the location they tweeted from – this means that if I had my (London) location registered and tweeted “#OMG these Kangaroos are awesome! :) #Kangaroo #AussieRules” while on holiday in Australia then the increase in happiness would be measured in the UK, not ‘down under’.

The other thing to consider is that when you search for :) on Twitter it also returns tweets that contain characters such as :D and :-). They all seem pretty happy though so the script doesn’t filter them out.

I started the script at the start of February 2014 – so far it’s collected 25,074 tweets :)

The Projection

The Twitter data returns the location of each tweet as latitude and longitude which need to be plotted onto a plain using a map projection.

There are a dizzing array of different projections available, from the Winkel Tripel projection to the more esoteric Hammer retroazimuthal projection. The graphic uses the classic Mercator projection as it’s the one most viewers will be familiar with (important where the image of the map has no outlines and instead emerges with the data).

It’s critics say the Mercator projection distorts the landmass around the poles – unless there are a disproportionate number of Twitter users in Nuuk though, it shouldn’t distort the graphic too much.

#TwitterMetrics: Daily Twitter Sentiment

The #TwitterMetrics project is about creating stories form everyday Twitter data. In this example I measure the sentiment of trending Twitter topics every 15 minutes using a Python script and plot the results using the d3.js library. You can follow the project on Twitter to get regular updates.

The Data

Getting the data out of Twitter uses the Twitter API, accessed using the python-twitter library. It’s cobbled together using some MySQL databases, formatted into JSON files and displayed using the d3.js graphical library (which takes some time and skill to get the most out of it but is certainly worth it).

The python scripts I’ve written runs automatically via a cron job on my Raspberry Pi. The script scans Twitter every 15 minutes and the web-data is updated once a day.

Sentiment Wordlists

The TwitterMetrics project uses a dictionary of positive and negative keywords developed by the American academics Tim Loughran and Bill McDonald, which is in turn an extension and refinement of the Harvard IV-4 Psychosocial Dictionary. The list is extensive but doesn’t include some terms I thought would be relevant (I don’t think they encourage the use of ‘lol’ or certain expletives in acedemic literature) so I added to the list some ‘Twitter-specific’ terms of my own.

The approach isn’t completely bullet-proof, it doesn’t work well with sarcasm or some slang (“That new NeYo song is the bomb, yo!”* would probably be misinterpreted for example). It’s good enough to make a fun infographic though.

Loughran & McDonald’s wordlist is available here.

*I’ve no idea whether or not this is something ‘the kids’ would actually say.

The Twitter Sentiment Index

To generate the Twitter Sentiment Index, the number of positive words are counted and the number of negative words subtracted. The total is then divided by the total number of words in the returned tweets to measure the relative ‘positive-ness’ of the tweets returned.

SentimentIndex_{t} = \left ( \left ( \frac{PosWords_{t} - NegWords_{t}}{TotalWords_{t}}\right ) - AvgSentimentIndex \right ) * 10000

The data is normalised by subtracting the average sentiment since the data gathering began, then multiplied by 10,000 (for no other reason than to remove the decimal places and make the numbers more readable).

The data is displayed with a daily and weekly simple moving average so that the change in sentiment can be visualised over time.

The Code

You can download the Python code and an importable SQL file for the database from GitHub