Tagging images by path in Shotwell

I've finally decided to use an image manager, and since it comes with Ubuntu this week I've gone with Shotwell. I've got a directory hierarchy containing most of my images which is sort-of sorted already, and I'm probably going to keep adding to it, if for no other reason than force of habit.

I know that one of the wonderful features of these photo managers is that you can tag photos, and obviously a photo can be in more than one tag rather more easily than it can be in several directories. That said, all photo managers seem to have decided that an easy, fast way to tag photos isn't what's needed.

Additionally, shotwell's got this weird thing for hiding the fact that there's a filesystem from you, and I can't find any way to tag files by directory. So I've poked around the database and written a script to do it, which is below and here and pasted below in case I change my mind about file hierarchies later.

The oddest bit is the way the filenames are linked to the tags. The TagTable table has a field `photo_id_list` which contains a list of photo IDs in a format that I've not found anywhere else in the (admittedly not very extensive) db.

They're created by taking the id of the image (its value in the `id` field of the PhotoTable table), converting it to a hex value, padding it out to 16 characters with leading zeroes, and then concatenating it onto the string 'thumb':

  1. my $hexPhotoId = sprintf("%x", $photoId);
  2. my $thumbString = "thumb".sprintf('%016s', $hexPhotoId);

Anyway, the script's a bit simple because bash is quite good at handling loads of files; usage is like this to tag the contents of ~/Pictures/2011-france/ with the tag 'morzine':

avi@brilliant:~$ find ~/Pictures/2011-france/ -type f -exec ./shotwell-tag {} morzine \;
Creating tag morzine
tagged /home/avi/Pictures/2011-france/R0012810.JPG with morzine
tagged /home/avi/Pictures/2011-france/R0012850.JPG with morzine
tagged /home/avi/Pictures/2011-france/R0012911.JPG with morzine
tagged /home/avi/Pictures/2011-france/R0012931.JPG with morzine
tagged /home/avi/Pictures/2011-france/R0012921.JPG with morzine
tagged /home/avi/Pictures/2011-france/R0012794.JPG with morzine
tagged /home/avi/Pictures/2011-france/R0012883.JPG with morzine
tagged /home/avi/Pictures/2011-france/R0012881.JPG with morzine

I've no idea if it breaks anything - I wrote it about an hour ago, have tagged ~500 photos with it since, and Shotwell doesn't seem to be annoyed. YMMV. Here's the script:

  2. #! /usr/bin/perl
  4. # shotwell-tag
  5. #
  6. # Tags files specified by filename in shotwell. Handy for
  7. # getting round shotwell's attempts at hiding the filesystem.
  8. #
  9. # Avi 2011
  11. use strict;
  12. use DBI;
  14. my $file = shift;
  15. my $tag = shift;
  17. if ($tag !~ /.+/){
  18. print "Usage:\n\n\tshotwell-tag [file] [tag]\n\n";
  19. print "Tags [file] with [tag] in shotwell's db\n";
  20. exit 1;
  21. }
  23. my $dbfile = $ENV{'HOME'}."/.shotwell/data/photo.db";
  24. my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","","");
  26. # Each tag has a string of photo 'ids'. These are generated
  27. # by taking the ID of the photo from PhotoTable, representing
  28. # it in hex, padding that out to 16 characters with leading
  29. # zeroes and then appending it to the string 'thumb'
  30. my $sth = $dbh->prepare("select id from PhotoTable where filename='$file'");
  31. $sth->execute();
  32. my $row = $sth->fetch;
  33. my $photoId = $row->[0];
  34. unless($photoId =~ /\d+/){print "$file is not in shotwell library\n"; exit 0;}
  35. my $hexPhotoId = sprintf("%x", $photoId);
  36. my $thumbString = "thumb".sprintf('%016s', $hexPhotoId);
  38. $sth = $dbh->prepare("select id from TagTable where name='$tag'");
  39. $sth->execute();
  40. $row = $sth->fetch;
  41. my $tagId = $row->[0];
  42. unless($tagId =~ /\d+/){
  43. print "Creating tag $tag\n";
  44. my $sth = $dbh->prepare("insert into TagTable (name) values('$tag')");
  45. $sth->execute;
  46. }
  48. $sth = $dbh->prepare("Select photo_id_list from TagTable where name='$tag'");
  49. $sth->execute();
  50. $row = $sth->fetch;
  51. my $photoList = $row->[0];
  52. if($photoList !~ /,$/ && $photoList =~ /._/){
  53. $photoList.=',';
  54. }
  55. if($photoList =~ /$thumbString/){
  56. print "$file is already tagged with $tag\n";
  57. exit 0;
  58. }else{
  59. $photoList.=$thumbString.',';
  60. $sth = $dbh->prepare("update TagTable set photo_id_list = '$photoList' where name='$tag'");
  61. $sth->execute;
  62. print "tagged $file with $tag\n";
  63. exit 0;
  64. }

Postfixadmin with clear-text passwords

One of my projects at the minute is converting vpopmail mail servers to postfixadmin. One _really_ handy thing about some of these vpopmail machines is that they store a cleartext copy of all the users' passwords, so I can feed them straight into the new system.

So, I've now got a postfixadmin system that stores cleartext passwords, and in case you want to do it, too, I've put a patch up. It gives you an extra couple of options in the config.inc.php file, which I hope are well enough explained by the comments:

  2. // cleartext
  3. // Do you want to store cleartext passwords for email accounts?
  4. // true = store cleartext passwords (need to have a password_clear column in the mailbox table)
  5. // false = don't store cleartext passwords
  6. $CONF['cleartext'] = false;
  7. // and the same for admins:
  8. $CONF['cleartext_admin'] = false;

If you do want this (and are aware of the problems with storing cleartext passwords) it's quite easy to do. First, add a couple of columns to the MySQL db:

  2. ALTER TABLE mailbox ADD `password_clear` varchar(255);
  3. ALTER TABLE admin ADD `password_clear` varchar(255);

Next, apply my patch:

avi@amazing:/var/www/postfixadmin$ wget -q http://avi.co/stuff/postfixadmin_plaintext-passwords.txt
avi@amazing:/var/www/postfixadmin$ patch < postfixadmin_plaintext-passwords.txt

Lastly, configure it; my patch sets both the config variables to 'false' because I like safeguards like that :)

It's worth noting that if you're using cleartext passwords, and then turn it off, the cleartext columns wont be affected - you'll need to update them with nulls or something if you want to get rid of the data in them.

Fail2Ban and date formats

Fail2Ban is utterly daft in at least one respect. Here's me testing a regex on a date format it doesn't recognise:

# fail2ban-regex '2010-12-14 15:12:31 -' ' - <HOST>$'
Found a match but no valid date/time found for 2010-12-14 15:12:31 - Please contact the author in order to get support for this format
Sorry, no match

And on one that it does:

fail2ban-regex '2010/12/14 15:12:31 -' ' - <HOST>$'

Success, the following data were found:
Date: Tue Dec 14 15:12:31 2010
IP  :

Date template hits:
0 hit: Month Day Hour:Minute:Second
0 hit: Weekday Month Day Hour:Minute:Second Year
1 hit: Year/Month/Day Hour:Minute:Second
0 hit: Day/Month/Year:Hour:Minute:Second
0 hit: TAI64N
0 hit: Epoch

Benchmark. Executing 1000...
Avg: 0.10257935523986816 ms
Max: 0.125885009765625 ms (Run 8)
Min: 0.10085105895996094 ms (Run 780)

Ignoring for the moment the fact that it doesn't recognise 2010-12-14 15:12:31 (Seriously?)1 , the only way to get that list of date formats is by happening to pick a correct one. As soon as you no longer need a list of date formats you may use, it presents you with one.


So, as an attempted fix for this situation, see above for a list of compatible date formats.

  1. It's worth noting, too, that the author is of the opinion that specifying your own date format is too much like hard work, so if you want support for any date format other than those already supported, you've to patch it yourself. Which is obviously way easier than just having a date regex in the config file []

Why I won’t be mirroring Wikileaks

I have a fair amount of 'spare' server space, and some very understanding service providers, and so it makes sense for me to mirror things in general, which I do. So when Wikileaks went down, mirroring it seemed quite a natural response. They need mirrors, and I have a mirror. I've been looking for something to do with my youcanstickitupyourarse.com domain for a while, and this seemed like a good bet.

Also in favour is the fact that Wikileaks is being a bit of a pain to a few institutions (well, governments) that annoy the crap out of me; I'd not mind being part of that. In addition, the huge majority of the released cables appear to be of no interest whatsoever, and the large governmental opposition to them has only served to increase the perception of their importance. I'd like as many people as possible to be able to read them such that they can judge for themselves how interesting they are. The point appears to be less what's been found out and more that anything has at all.

But I have concernes, too. Firstly, these cables were all sent on the basis they were confidential, so they naturally contain the sort of information that neither end wants made public. I'm already livid at the apparent acceptance of just anybody being able to subject me to surveillance, and I don't see why embassy staff should necessarily be treated differently. The argument that they work for the government is moot - millions of privately-employed people do work for the government, and they also should have a right to an expectation of privacy. I honestly have no problem at all with governments talking to each other in privacy, it seems to be quite a natural way of working and is not at all contrary to the idea of an open government.

Second, and of more concern than that, is the sort of things these people are likely to be sticking down encrypted tunnels. I don't want to inadvertently find myself hosting a document that results in an informant being tortured or killed. I don't really want to be party to releasing information that only serves to embarras or otherwise compromise someone. I don't want *anyone* to do that, but I've only got control over my servers.

That's all well and good, you say. Wikileaks are sifting through these and specifically redacting anything they deem not fit for release. That's some hubris right there.

And here's the difference. I trust the Debian project, and Canonical, the Perl foundation, Zend and the like, to not put things I disagree with on my server. I do not trust Wikileaks in this respect at all.

The whole 'Collateral Murder' release is a great example of Wikileaks not releasing information for the sake of it being free, but releasing specifically compromising information, with a decidedly skewed context, in order to further some particular viewpoint. That video, or perhaps its commentary, removed the bulk of my respect for Wikileaks. Why on earth would I assume they're not going to similarly skew the releases here also? Wikileaks does have a stated aim they're pursuing with all the leaking; it's not just because they feel information should be free.

So, it's not that I've got some opposition to the leaking, or feel that it shouldn't be mirrored. It's just that I don't feel I can trust Wikileaks to only publish what I think should be published, and picking-and-choosing which bits to host is not how a mirror works.

Getting root on a UK T-Mobile Galaxy S

It's a bit weird. The process was really easy, but none of the tutorials I found worked; each stopped working at one point or another. So, assuming other people will hit the same barriers and want a Just Works way to get root, I've gone through my terminal history for the bits that worked. Obviously, this is just what worked for me; I can't guarantee it'll work anywhere else though if you're at all familiar with the process it'll probably look right. The only real stumbling blocks I hit were getting a recovery menu (unfamiliarity with adb) and picking a ROM that I could trust. There was nowhere near enough diligence in that bit of the process, though. Bad Avi. Also, for those still expecting disclaimers, this voids warranties.

I can't find a reboot-and-hold-down-X-key method of rebooting into recovery mode (the boot menu) that works, and it seems to be different for each variant of this device in any case. Using adb, the Android debugger, does work, and contrary to several scare stories doesn't require proprietary Samsung drivers.

adb's really easy to make work. First, install a jdk. You might do this differently if you're not running Debian:

root@debian:~# apt-get install sun-java6-jdk

While we're here, if you want to suggest a more interesting hostname, go for it. I'm having a bit of an imagination failure in that department.

Next, you need to grab the tarball of the Android SDK, extract it somewhere and make its tools subdir part of your $PATH. You probably should do most of this as some user that isn't root, but I was rather excited at the time:

root@debian:~# mkdir adb && cd adb
root@debian:~/#wget -q http://dl.google.com/android/android-sdk_r07-linux_x86.tgz
root@debian:~/# tar -xzf android-sdk_r07-linux_x86.tgz 
root@debian:~/adb# ls android-sdk-linux_x86
add-ons  platforms  SDK Readme.txt  tools
root@debian:~/adb# export PATH=${PATH}:/root/android-sdk-linux_x86/tools

Now (or perhaps while you're waiting for the tarball to arrive), enable USB debugging on your phone. It's under Settings -> Applications -> Development for some reason. Check the box next to "USB debugging". You'll need to have the USB cable unplugged. Plug it back in again, then, returning to your shell with adb in its path, check for the presence of your device:

root@debian:~/adb# adb devices
List of devices attached 
90006e8ba84e    device

If yours doesn't show up, I'm not really sure what to do. Google?

Now, you need to have the update.zip somewhere. I've uploaded the one I've used to here but this'll work for any of them.

root@debian~/adb# wget -q http://aviswebsite.co.uk/stuff/galaxys_root/update.zip

If UMS works on yours, copy update.zip with your favourite file copying method. Mine didn't, so I used adb. There are two sdcards in the Galaxy, an internal one and an external (removable) one. The internal one is mounted at /sdcard, the traditional location of removable ones, and the external one at /sdcard/sd. You want to put update.zip in /sdcard, not /sdcard/sd.

root@debian:~/adb# adb push update.zip /sdcard/update.zip

You then use adb to reboot into the recovery menu:
root@debian:~/adb# adb reboot recovery

Select "Apply sdcard:update.zip" and wait while it installs it, then "reboot system now". You now have root. The quickest way I can think to test it is to download and install the 'superuser' application from the market, then test it with adb:

root@debian:~/adb# adb shell
$ su

You'll get prompted (on the phone) to allow an unknown application root access, and then you'll have root. Congratulations, your phone is now yours. :)

Now, I'm off to follow the rest of How to make the vibrant software not suck, 'cause it's shocking out of the box.

Windows Browser Ballot

Mildly controversially, MS have found themselves compelled to offer Windows users in the EU a 'browser ballot' screen in an effort to make IE a less default choice, which is fairly understandable (if perhaps not understandably fair).

But MS have decided for some reason that the best way to do this is is render it as a web page in IE8. So before making an unbiased decision with no leading questions, you've already had to configure IE8.

MS Windows Browser Chooser

I was going to post about the crapness of IE8's multiple configuration dialogues, but this is more amusing.

Windows’ find command

It would appear that, not-entirely-contrary to my common rant that Windows offers no text processing tools at all, Windows does offer a find command which is like a severely crippled grep.

It has five options:

U:\>find /?
Searches for a text string in a file or files.

FIND [/V] [/C] [/N] [/I] [/OFF[LINE]] "string" [[drive:][path]filename[ ...]]

  /V         Displays all lines NOT containing the specified string.
  /C         Displays only the count of lines containing the string.
  /N         Displays line numbers with the displayed lines.
  /I         Ignores the case of characters when searching for the string.
  /OFF[LINE] Do not skip files with offline attribute set.
  "string"   Specifies the text string to find.
             Specifies a file or files to search.

If a path is not specified, FIND searches the text typed at the prompt
or piped from another command.

So it kind of does what I want about 70% of the time I use grep, and is probably a reasonable stand in afterwards. Certainly beats firing up notepad and ctrl-F ing.

Now to find an awk and a sed...

UI Fail: scanpst.exe’s incompatibility

Sometimes, on trying to scan a PST with MS Office's bundled scanpst.exe, you get the below error:

"An error has occurred which caused the scan to be stopped"

And a log that ends:

Fatal Error: 80040818

What MS meant to say was:

You're scanning an Office 2003 PST file with the scanpst tool that shipped with Office 2007. For some reason, we decided that while Outlook 2007 can cope with both, scanpst can't

In an attempt at usefulness:
On my WinXP/Office 2007 box, scanpst is at C:\Program Files (x86)\Microsoft Office\Office12\SCANPST.EXE and downloadable here.

On our Server03/Office03 box, it's at C:\Program Files\Common Files\System\MSMAPI\1033\SCANPST.EXE1 and downloadable here.

I've no idea if these downloads are of any real use. Try them and see.

  1. I'm told the '1033' pertains to geographic location, but I've no real idea. Browse if it's not there. []

Joining the Canonical =~ Microsoft fray

I've had this knocking about for a while in various forms. Following TheOpenSourcerer's post, I figured I'd get it in while he's getting the flack.

About a year ago, I remember there being some rejoicing at the prospect of Canonical open-sourcing Launchpad, their bug/issue/ticket tracking web application. I also remember being a mite confused by it. Canonical is the company behind Ubuntu Linux, the popular open source operating system. Surely they, of all people, had opened the source from the start? What does it say when the company most loudly and successfully pushing open source as an efficient means of software development to your average computer user, develops its in-house software behind closed doors? And, accepting that, why is opening the source means for rejoicing? It is surely the belated Right Thing To Do. If anything, the response should have been along the lines of "Why so long?"

More recently, I decided that a hodge-podge of scripts to keep my files in sync between PCs wasn't a good idea, not least because it didn't actually work, and since my home PC and my laptop were both Ubuntu, and Ubuntu One seemed easy enough to install, that'd do the trick. So I installed it and started using it. Then I decided to get my work PC in on the game. And find this message:

Requirements: Because we want to give everyone using Ubuntu One the very best experience, we require that you run Ubuntu 9.04 (Jaunty Jackalope) or higher.

Which is something I don't think I've come across before - a Free Software company producing software and inventing restrictions. Why shouldn't Ubuntu One work on my Debian desktop?
This incompatibility for the sake of it is something I remember from Windows, and it's not a good memory. I know it's possible to write a client for it - the client is at least open source - but the message that I am required to use Ubuntu to use it? What good does that do anyone?

Most recently came the news that on the netbook edition Canonical have decided to drop OpenOffice.org (which *is* undeniably bloated) and use Google docs in its place. Google Docs is completely proprietary. It's about as closed source as software can get, since you can't even study its behavior, only those interfaces you're permitted with it.
Why wasn't AbiWord used, with it's online service, for example? Or a pared down OpenOffice, perhaps? Canonical has shown in the past that it has the developer hours to make fantastic, awesome, changes to software. Why not do that now?

Ubuntu is the most popular desktop Linux distro. I'm sure there are ways of counting such that Fedora wins, but if something's packaged for Linux, it's available in a Ubuntu-pointed deb. And so it occupies a unique position for free software - it's an opportunity to be a fantastic demonstration of what is possible with free software. It is possible to make commercial progress without restricting user freedom, and it is possible to make a wonderfully usable operating system under these conditions.

Except Ubuntu's not demonstrating that. It's showing that using a billionaire benefactor and a bunch of closed source software we can turn a free operating system into a mostly-freeish wonderful one.

And I'd rather like Canonical to stop doing that, and get back to making free software look good.

Splitting massive MySQL dumps

As I posted yesterday, I have a massive MySQL dump to import. I tried BigDump, but one of the tables kept producing errors and so BigDump would exit. I don't need the whole db imported, so I wrote this to split it by table. It produces a new sql file for every table it finds, numbered sequentially so if you process them in alphabetical order it's the equivalent of the whole dump. USE statements get their own files in the same sequence.

  1. #! /usr/bin/perl
  3. use strict;
  4. use warnings;
  5. use 5.010;
  7. my $dump_file = $ARGV[0];
  8. &usage() if !$dump_file;
  10. say "using ".$dump_file;
  12. my ($line, $table,@query, $file_number,$file_name);
  13. my $line_number = 1;
  14. my $find_count = 0;
  16. open(DUMP_IN, "< $dump_file");
  17. while(<DUMP_IN>){
  18. my $line = $_;
  19. if (/^USE\s.(\w+)./){
  20. say "changing db: ".$1;
  21. $file_name = &make_file_name("USE_$1", "$find_count");
  22. &write_USE($file_name, $line);
  23. $find_count++;
  24. }elsif (/^-- Table structure for table .(.+)./){
  25. ## If the current line is the beginning of a table definition
  26. ## and @query is defined, then @query must be full of the previous
  27. ## table, so we want to process it now:
  28. if (@query){
  29. $file_name = &make_file_name("$table", "$find_count");
  30. open(OUTPUT, ">$file_name");
  31. foreach(@query){
  32. print OUTPUT $_;
  33. }
  34. close OUTPUT;
  35. undef @query;
  36. }
  37. $table = $1;
  38. $find_count++;
  39. }
  40. next unless $table;
  41. push @query, $line;
  43. $line_number++;
  44. }
  45. close DUMP_IN;
  46. say $line_number;
  48. ## Subroutines!
  49. sub write_USE() {
  50. my($filename, $line) = @_[0,1];
  51. open (OUTPUT, ">$filename");
  52. print OUTPUT $line;
  53. close OUTPUT;
  54. }
  56. sub make_file_name() {
  57. my ($type, $number) = @_[0,1];
  58. $number = sprintf("%05d", $number);
  59. $file_name=$number."_".$type.".sql";
  60. return $file_name;
  61. }
  63. sub usage() {
  64. say "Error: missing arguments.";
  65. say "Usage:";
  66. say "$0 [MYSQL_DUMP]";
  67. exit 1;
  68. }

A small downside is that this replaces my 2.5Gb file with about 1800 smaller ones. A scripted importer is to follow.