7 May 13

Storing images from scraped websites for online use

Storing images from scraped websites for online use

I use a Vagrant server for development and testing. It has all my Python modules, database connections and sundry bits 'n pieces I need to do my job. Yesterday I was sat in my telly chair watching the snooker working with the Google Places API. I needed to take a bunch of their data and images and import them into Nymbol, my mobile CMS.

Problem is, Google Places image requests are tied to API requests, which in turn are tied to an IP address, so although I could import the data, the Nymbol server wouldn't be able to grab the images. So, Dropbox to the rescue!

This couldn't have been much easier, in fact. All I had to do was add my Dropbox Public directory as a shared folder so that my Vagrant server could see it. Once done, my Python script could download each image, pop it in /dropbox/scraped-photos (or whatever) and use my public URL stub in my generated XML file.

Really simple solution should you ever have the need to deal with images that are for whatever reason restricted (but which, naturally, you have the right to use!)

Share and enjoy.

27 Mar 13

Presenting "formatrules"

Presenting "formatrules"

In The designer blog post, I wrote about updating the blogging app in my toolset to allow easy offline creation of blog posts. For standard pages I've gone a different direction, with a library I've started, called formatrules.

With this Django app - which, for the uninitiated is the Django community's word for what most people might call a plugin - I've created the ability to define multi-column layouts in Markdown, without writing any complex HTML. Or any HTML at all, for that matter. Here's an example of the text of a page:

Donec id elit non mi porta gravida at eget metus.
Donec sed odio dui. Nullam id dolor id nibh
ultricies vehicula ut id elit. Praesent commodo
cursus magna, vel scelerisque nisl consectetur et.

// Block of three Aenean lacinia bibendum nulla sed consectetur. Maecenas sed diam eget risus varius blandit sit amet non magna. Cras mattis consectetur purus sit amet fermentum. Curabitur blandit tempus porttitor.

// Block of three Cras mattis consectetur purus sit amet fermentum. Donec ullamcorper nulla non metus auctor fringilla. Donec id elit non mi porta gravida at eget metus. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.

// Block of three Cras justo odio, dapibus ac facilisis in, egestas eget quam. Nulla vitae elit libero, a pharetra augue. Donec id elit non mi porta gravida at eget metus.

The text is formatted so that it can be put through the Markdown filter. But where it gets fun is in those double-slashes. They're not just comments, but instructions to a filter which reads them and then wraps the proceeding content in Bootstrap columns. "Block of three" basically means "one third of a page". I could equally say "block of two", "four", "six" or "twelve". I can even get cleverer with "two-thirds block" and "half-block". So here's the process the code runs through:
  1. Use a regular expression to look for new lines starting with a double-slash and an instruction.
  2. Check whether that instruction matches a given list of regular expressions
  3. Parse the text, taking everything from just past that // line, to the next // line (or the end the text if there are no more instructions)
  4. Pass that parsed text to the function we matched up in the second step
  5. Replace the parsed text with the result of that function
  6. Look for the next set of double-slashes
Step three involves a third-party function. Well, it's actually a class, and it can do a couple of nice things. It can parse the text given to it, and also clean up after itself. I'll explain.

The // comments aren't nested; one instruction is processed after another, so if there's no need for an explicit "end block" instruction. However, with Bootstrap you have to create columns inside a "row", so my class knows when its parsing function is being called for the first time, and it opens a <div> with a class of row. The formatrules filter runs the cleanup function on any class that's been used during the parsing of the text, so the cleanup function is run on my class and the "row" element is closed.

The real-world example - being the only parser I've developed for the formatrules filter so far - is probably a bit overcomplicated, so let's simplify.

What if I wanted a whole block of text to be bold? Rather than surrounding it in double asterisks in the Markdown way, I could have an instruction like so:

// Bold

All of the rest of this text will be bold.

I'd create a class that responds to the regular expression ^Bold$, and add a function that wraps the proceeding text in a <div> tag with a style or class attribute. I wouldn't, as that would be ghastly and antisemantic, but you get the idea.

Any instructions that followed would override the bold instruction, because I figure simplicity is better than flexibility when you're dealing with a web-based text editor.

As I mentioned, the "block" parser is the only one I've written so far as that's all I wanted to do, but you get an idea of how useful it is when you see the layout it produces, with very simplistic - and more-importantly, human readable - instructions.

Screen Shot 2013-03-26 at 23.48.15

I love the uncluttered simplicity that Markdown provides, so I wanted to develop something that echoed that approach. There are loads of ways this can be extended and improved and made more flexible for developers - allowing the classes the parsers produce to be overridden for example - and I've made developing new parsers pretty simple. However the biggest limitation I've come across so far is that, because you're wrapping Markdown text in HTML elements, the Markdown parser - at least the Python one - ignores the paragraphs as it assumes that whatever is in that box is "raw" HTML, so I'm having to parse the text inside each "block" with Markdown, then parsing the whole lot through again (obviously the parser then ignores the bits inside HTML tags so it's not exactly doing the same thing twice). This is inefficient but hey, it's a start.

If you like the idea, bambu-tools is a set of Django reusable apps that I've built and use in production environments. It's not well documented right now, but it's up on PyPi for your perusal, judgement, comments and suggestions. You'll also find the code on BitBucket (without some of the changes in the PyPi version. There's a reason for this, it's just not a good one).

If you like the idea, feel free to steal it and build it into your next project. Just maybe gimme a credit and get in touch if you have any questions.

Photo by Jason Dean

7 Mar 13

The designer blog post

The designer blog post

It's been around for a while, but the concept of stylised blog posts - where each post is uniquely laid out - is increasingly popular, and attractive. I'm implementing a little of that over on the Nymblog (the blog for my mobile CMS), but I've now just made the process of building and uploading stylised blog posts much easier. It's only in Django at the moment, but I'd like to port this over to WordPress. Here's how it works.

Every site is different

You download a boilerplate HTML file from the Django admin. This is generated from a template. In Django - a little like in WordPress - you can override templates, so the boilerplate file can come from my generic blog app (an app in Django is like a plugin in WordPress) or from the actual site itself. So I've created a boilerplate file specifically for the Nymbol blog.

The idea is that, when downloaded, you get an HTML page that you can edit in a text editor and preview in a browser. All the references to stylesheets and JavaScript files are absolute, so as long as you're connected to the Internet your page will look and function pretty much like a normal blog post.

So what's cool is that you're getting a boilerplate file tailored to that specific site. The same principle would work with a WordPress blog. WordPress would generate a fake blog post then export the HTML for the author to download.

Writing the post

There are a few HTML elements with special attributes, which the system uses to read your blog post. Here's a snippet:
<h1>
    <a href="#" data-bpfield="title">This is my post</a>
    <small>Posted <span data-bpfield="date">March 8, 2013</span> by mark</small>
</h1>
The data-bgfield attributes map to the title and date of the post. I can use lots of different date formats, and my app will convert that into a date that can be stored in the database. Then I look for a snippet like this:
<div data-bpfield="body">...</div>
I put the HTML of my blog post where the ellipses go.

Styling it up and adding some spice

Of course the whole point of this exercise is to allow custom styling, so to do that I look for an HTML element like this:
<style data-bpfield="css">...</style>
I put the CSS for my blog post in here, which is stored in the database in a separate field, not embedded in the HTML. Usually all the CSS rules would have to be prefixed with a class that is only applied to single blog posts.

Now here's where it gets cool. I can add images and other files to my blog post. I start by putting them in the same directory as my HTML file, then just reference them using a relative URL, like this:

<img src="kitten.jpg" />

Zip it and upload!

Once I'm happy that my blog post looks great in the browser, I zip up the files I've created and upload them via the admin area. The blog app then unpacks the Zip file, extracts the HTML and looks for any files referenced (basically anything with an src attribute). If it finds a file with that name inside the zip, it extracts it, adds it as an attachment to the blog post (which naturally changes its URL), then replaces that URL within the HTML and CSS.

I've added an option in my app which allows me to convert the HTML of the post body to Markdown (the syntax used by default within my blog app). The nice thing about Markdown is that it does allow HTML to be added to it, so if I untick that option, rather than converting the HTML to Markdown, it leaves it as it is. The first option is useful if you want to edit the post later on; the second is useful if it's very stylised, with lots of classes and other attributes which don't have a place within the Markdown syntax.

Limitations

Probably the biggest limitation so far is that, if you reference an image within your CSS but don't include it in your HTML, the find-and-replace thing won't work. That's an easy problem to fix; I just haven't yet.

You can't provide styling for a blog post within a list, only for the single post page. This is because you don't know the ID of the blog post you're targeting when you write the HTML locally, so you can't target that specific element within a list. The way to get over this is to set an ID in the boilerplate HTML which you can use in your CSS, then replace that ID with the correct ID of the blog post when published.

There are probably other limitations, but they don't spring to mind just yet.

Porting

If this doesn't exist already, I think it'd make a really nice WordPress plugin. Sometimes it's useful to have the designed approach alongside the might of the WordPress engine, to handle comments, trackbacks, RSS, that sort of thing. Adding custom CSS for each blog post is as simple as creating a hidden custom field, and using a plugin to spit out the CSS when needed.

If it does already exist for WordPress, even better 'cos that means I don't have to write it! But I wanted it for my Django toolset, so now I have it.

Photo by Thalita Carvalho

9 Feb 13

How I integrated Twitter in 30 seconds

How I integrated Twitter in 30 seconds

Months ago I built a jQuery plugin called jQuery.tweetspan. It's a really simple, currently very basic way to integrate your Twitter account into a website. It came about after Twitter hobbled their API, and was built in an afternoon. Here's how I got it working on a blog I setup yesterday:

Firstly, I downloaded https://raw.github.com/substrakt/jquery.tweetspan/master/jquery.tweetspan.min.js to my site's /media directory.

I added the <script> tag to the template I needed it (this is in a Django site, but that makes no difference as it's a totally client-side thing). That's the only <script> tag you need; there's no other JavaScript to call.

I then popped the following code where I needed it (this was based on sample code; I'm quick, but I couldn't have written this in 30 seconds):

<h3>Follow @<a href="http://twitter.com/nymbol">nymbol</a></h3>
<div class="tweets" data-account="nymbol" data-count="1">
        <div class="tweet">
                <p>
                        <span data-field="text" data-format="tweet"></span>
                        <small>
                                <br />
                                <span data-field="created_at" data-format="timesince capfirst"></span>
                        </small>
                </p>
        </div>
</div>
Then I hit Refresh.

It uses the Twitter Bootstrap mode of thinking, where HTML naming conventions are used in place of JavaScript calls. The jquery.tweetspan.js file looks for an element with a class of tweet and a data-account attribute. It uses the data-count attribute to determine how many tweets to display, then takes the inner .tweet element as a template, cloning that element for each tweet it retrieves. All the field names you see in data-field correspond to data that comes from Twitter's search API, and can be formatted via the data-format attribute (I've written a few basic filter functions, but there's scope for more).

But because it uses the search API, it has a downside: it won't show tweets that are very old, so if your Twitter account is seldom updated, you might find you have an empty box (in which case, the box just won't display). But because it's client-side, you don't have to worry so much about API throttling, as calls aren't bound to your account, but to the visitor's browser.

Within that GitHub rep there's also a WordPress plugin which you can integrate quickly; all it does is just add the <script> tag for you, so you can put the tweet boxes in your theme's HTML. Maybe later I'll work on a widget.

So that's it really. Feel free to shout if you've any questions or you think there's a better solution. It's been out for ages, but I only integrated it for my own site last night - having made it available for colleagues to use within WordPress - so I thought I'd give it a bump.

Photo by Lily

20 Dec 12

Help me improve Buffer for WordPress

Help me improve Buffer for WordPress

A while back I wrote a WordPress plugin which allowed Buffer users to post to Twitter, Facebook, LinkedIn and most recently App.net. It works in most cases, but there are a few edge cases that are causing people grief.

Unfortunately I don't really have time to test each case and figure out a solution, but I don't want the plugin to fall by the wayside. The team at Buffer have been brilliant, forwarding issues raised by their users (who usually assume that Buffer developed this plugin themselves), and for the most part they're going unheard because I just can't dedicate the time.

If you can help, either by helping me respond to these users and figuring out their issues and then giving me a hand in updating the plugin, I'd really appreciate it, and of course you'll get all the plaudits and credit I can offer. If you can't help but you know someone who might be able to, please share.

Thanks a lot! -Mark

1 Oct 12

Swiss army knives are fine until the hinges rust

Swiss army knives are fine until the hinges rust

I have a combination washing machine and tumble dryer. It washes clothes fine, but I could dry them faster by breathing on them. But it's a convenience, as in my flat space is a premium. But two, three or even four is not better than one as I've found to my cost when using Coda 2.

I was quite excited about the release of this web development toolkit, and bought both the Mac and iPad versions. It combines a decent code editor, an FTP client borrowed largely from Transmit (which the company also makes), a basic user-interface to Subversion and a terminal. It also has a preview option, but why anyone would use it I don't know, as very few sites are built without content management systems in place. But without this you're left using separate apps. My personal choice is TextMate for code editing (the old warhorse that still works brilliantly and whose sequel will never see the light of day),Transmit for file transfer, Versions for SVN and iTerm as my terminal (although the Mac's built-in one is fine).

All of these tools work fine, but it means a lot of switching, so an app that combined it all seemed to make a lot of sense. Apart, that is, from the litany of bugs that have plagued it since release, and which haven't been fixed. Things like its sudden inability to connect to SFTP or SVN servers, its weird and uncustomisable key combinations, its regular crashes, inability to tab between files or handle any concept other than a website (I often work on other things like Python packages or WordPress plugins, and don't think I should have to use a different text editor because it doesn't fit Coda's paradigm).

Coda isn't a tool for novices or those trapped in WYSIWYG hell (like Dreamweaver or FrontPage), but it seems to treat me with kid gloves. Crashes I can deal with, but they seem to hold themselves up to such a high standard with Transmit that you'd think Coda would be as good as Panic, their creators think it is. (That's a dreadful sentence, but you get what I mean!)

So beware the all-purpose multitool. It may be more economical and space-saving, and it may save you time to begin with, but stick with it and you'll start to see the benefits of buying the stack separately. Which reminds me: whatever happened to stacker systems? Are they still a thing?

28 Sep 12

Animating the world - Part 1: Staging

Animating the world - Part 1: Staging

For the last couple of weeks I've been involved in a job which involves rendering animations onto a map of the world. I can't talk about the specifics of the outcome, but I thought it would be useful to chronicle the process as most of the work involved using technologies I was pretty unfamiliar with.

Rather than write it up in one massive post, I thought I'd split things up a little. So let's start at the end: the finished product.

Staging

The project had to run on a touchscreen device, originally as some sort of app, but essentially an HTML5 page. It had to run offline which meant making the data available locally but also happily gave me the ability to specify the browser that would be needed to show the work. I went for Chrome or Safari (using the WebKit rendering engine) as they're reasonably interchangeable and friendly both to Macs (which I use) and PCs (which the project would be run from).

I first thought the simplest way to make the data available locally would be to use an in-browser database, which is powered by SQLite. Modern, standards-based browsers allow developers to work with a local database connection provided by the browser. It's really easy to setup and quite easy to work with via JavaScript. But as I soon found out to my cost, SQLite has its limits, and I wasn't quite prepared for the sheer volume of data I'd be working with.

As I was on an HTML5 tip, I thought Canvas would be the best starting point. But as I was animating multiple objects which I needed to keep track of, I quickly discovered this was the wrong move. Canvas works by drawing objects on the screen, but animating them involves redrawing the entire canvas all over again, plus the changes you wanted to make. So I switched to an animation framework that used SVG, which is basically a way of specifying shapes, fills and strokes in a standard, machine- (and vaguely human-) readable format. More on that a little later.

6 Aug 12

Study period

Study period

I've been moving my stuff to a new server over the weekend, and enjoying a little time with the family, so the top-and-tail videos I recorded for this week's Stac videos were done so in lower light conditions. MacBook cameras do really badly in low light, so the videos look terrible. Plus I haven't shaved, and no-one needs to see that much hair on that many chins.

The actual lesson is written and recorded, but I felt it was more important to have a higher-quality video than to adhere to my self-imposed schedule. So it'll be up, hopefully no more than a day or two late. I can't record the video Tuesday morning for reasons I won't go into, otherwise you'd have had it that day. But suffice to say it'll be out this week. And it's all about databases. Wheeeeeee :)

Anyway, stay tuned. Oh, and I'm still going to be making all the videos available with no need to sign up, but if you want to see the homework and tick the tasks off when you're done, you'll still be able to sign up.

Cheers!

6 Aug 12

The end of an Amazon adventure

The end of an Amazon adventure

I spent much of this weekend moving my websites from the two Amazon EC2 servers I had, over to a hosting company I've used before and trust immensely, Bytemark. I thought it might be useful to share some of the things I've learned from using Amazon for the last 18 months.

Amazon is slow.

EC2 (the Elastic Compute Cloud) is where you host the files that make your site run. Back-end code, templates; stuff that is generated dynamically. You usually run a few "instances" of the same server, so that if one instance goes down, another is ready to take its place. I just kept the one copy of my two servers (one server for Django, another for PHP), safe in the knowledge I could ramp up when needed.

RDS (the Redundant Data Store) is where you stick the data for your site. There has to be constant communicate between your EC2 instances and the data store, which may be different physical pieces of hardware, but either way incur network traffic costs (both in time and money). It's a really good system, because it means you effectively keep one cloud-distributed copy of your data and share that between the various instances of your servers.

S3 (their Simple Storage Service) is where you'd put stuff like CSS, JavaScript, images, and files uploaded by you or your users via a browser. Unless you take the time to write the right code, you'll probably end up having to upload the files to your EC2 server which then sends them on to S3. Lost yet? I am. One of its really nice benefits is CloudFront, which is a content delivery network. It distributes copies of your files to servers all around the world (or within the certain geographical boundaries that Amazon has setup), and serves that in standard downloadable form, or streams it via the Flash streaming protocol RTMP. So it's great for streaming audio and video. And because you pay only for the space you use, you never hit a storage or bandwidth limit.

But basically all this network traffic, coupled with what seems to be an inherent slowness in EC2 leads to a noticeable lag. It's probably the conversation between the web server (EC2) and the database server (S3), but I'm not one for analysing graphs and numbers. But that wasn't by far the main reason I decided to jump ship.

Amazon is expensive.

When you use Amazon's services in the way they're intended, you really really rack up the cost. My last hosting bill came to $700. This is for around a dozen websites, with two of them streaming a little bit of video. The biggest cost seemed to be RDS, and all the traffic that's necessary to make my sites work. If I were running some high-profile, high-traffic sites I could justify the cost, but for my piddling lot of nonsense it really is just ridiculous to pay that much.

You can work out all the costs for Amazon's services via their calculator, but only if you can predict the future. I can guess at how much data I'm going to store, but how am I to guess how many times a file is going to be streamed or downloaded? What if something goes viral?

This is where cloud hosting falls down for me. The scalability is wonderful, but you can't budget for it because you can't predict how your data is going to be used. Most of the cost for data comes when it's downloaded. Storing and uploading files is relatively cheap, but downloading and streaming them is expensive. So, just something to be aware of.

Amazon is reliable.

I mentioned I had two servers: Colin and Blomkvist. Colin went down a lot, but I'll put that down to my inability to configure Apache to deal with WordPress' various holes and inefficiencies (and the blemishes found in third-party plugins). Blomkvist however, which had a much harder job, running several Django sites with lots of different processes going on (including encoding video via ffmpeg) never went down once. Not once. It was an absolute trouper of a machine, and it was only a level or two up from Colin, in terms of power.

I never had a problem with RDS. I had many of my DNS records hosted with them and they were fine. All my files stayed in tact and were always available. Amazon's Control Panel was also there when I needed it, so I really couldn't complain.

Amazon is simple.

If you're a developer with experience managing a VPS, you'll have no problem getting your head round Amazon's setup. Once you know what their various names and acronyms mean, you're pretty much set.

Amazon is not available for comment.

Unless you pay for a certain type of account, you don't get any type of support. I could argue that for $700 I should've had someone sitting with me at all times, checking that everything was still working, but that's the bargain you enter into. I don't even think you get email support; your'e left with the "community" option which, if you're a fan of flaming and lols is probably great fun. I had one major support issue - which was a problem I caused - and had no-one at Amazon available to help me. That's a problem if, like me you like a host that you can call up and say "Hi, can I speak to John; he's been dealing with my server". I don't want to mis-sell Bytemark's services, but that's a bit more of the feel you get with them... certainly over email anyway (I don't think I've ever needed to call them).

Amazon is no longer my host.

It was basically the price that drove me away. I knew exactly who I wanted to go back to, and that I could save my business about £400 a month in the process. The difference in speed was noticeable when I got my first site up, and stayed that way when I loaded all the rest on. Let's see how it copes with WordPress and Django bunking up together though :)

If you want to dip your toe in cloud hosting, there may be other providers better suited, but Amazon really does the complete package, and makes it all manageable. I don't think all cloud providers have got their head around the user interface, in the same way that UNIX people think that all developers like to see yellow fixed-width text on black boxes, and think that any display of data that isn't in a table is just a "pretty picture". Amazon makes everything manageable through its web-based control panel, and once you've got a machine booted up, then your'e into a terminal window and on familiar ground.

After i tweeted my thoughts on Bytemark last week, I got a reply from their MD asking if I wanted to check out their cloud solution. I'll definitely be checking it out once the dust has settled on my new server. Cloud hosting works, but you do have to keep an eye on the money.

1 Aug 12

Lifting the veil on Stac.ly

Lifting the veil on Stac.ly

My first module on Stac.ly is in its third week (this week we look at uploading a website to some free hosting space, via FTP) and I've decided to make some changes. The site needs some user-experience loving anyway, but the first task is to remove the requirement to have an account to access the lesson.

Why? 'Cos barriers aren't helpful. You'll need an account to access the homework, and if you want to get notified about new lessons as they're released, but it'll no longer be a requirement.

So give me a couple of days to get that straight, and by the release of the next lesson (Tuesday 7th August) you'll be able to learn a little about databases with no need to register. Registration will still be available, and it'll still be free.

If you want to check out the three lessons that are already there (HTML, CSS and FTP), check out the Web fundamentals module.