Sunday, August 19, 2007

iRediculous

So as I count down to my iPhone1, I try and make do as best I can with my iPod. One thing I have yet to do is figure out how to convert MythTV default format AVI files to the iPod video format I've don't that once before, but I didn't really figure out a quick & easy way to do it repeatably. Thirty minutes experimenting with transcode and a little research on Google and I'll have it.

In the meanwhile, I figured it'd be quick and easy to put some eBooks on the iPod. Sure enough, when I look on my iPod in the "Extras" menu, I see a "Notes" section, with the following text:
To view text files here, enable iPod for disk use, then drag text files to the Notes folder on iPod. See the iPod Features Guide or go to www.apple.com/support/ipod for more information.


Simple enough.

Connecting my iPod to my PC, I activated "Enable disk use" for the device. It then appeared in my explorer as E: drive, with a "Notes" folder under the root. Doing some research, I learned that the iPod will allow up to 1000 files, each with up to 4012 characters. That's a fair bit of text, though I would have expected more, given the disk space on the gadget. But then, I guess Apple is all about multimedia these days (sound and video, not text).

Given that the Gutenberg Project has lots of great books, there's lots of free content to choose from. So, I wrote up a simple Perl script to hack up a large text file into smaller pieces so it could fit into the iPod's inane requirements.

Here's what I came up with.

#!/usr/bin/perl

$filename = $ARGV[0];
open(INFILE, "$filename");
undef $/;
$content = ;
close INFILE;
$ipod_max = 4012;
$ipod_max -= 17; #Title tags
$ipod_max -= 34; #tail link tags
$ipod_max -= length $filename;
$ipod_max -= 7;

print "Slurped ". length($content) . "\n";
$chapter = 'CHAPTER ';
#break it up by chapters
@chapters = split(/$chapter/, $content);
print "Found " . $#chapters . " chapters.\n";

$count = "000";
foreach $chapter (@chapters)
{
print "Working on chapter $count, ";
my $part = "001";
my $outfile = $filename;
$outfile =~ s/\.txt/$count\.$part/;
open (OUTFILE, ">$outfile");
$content = "CHAPTER ";
@chapter_by_lines = split("\n", $chapter);
$chapter_length = length($chapter);
print "found ". $#chapter_by_lines . " lines ($chapter_length bytes)\n";
foreach $line (@chapter_by_lines)
{
$len_content = length $content;
$len_line = length $line;
$len_total = $len_content + $len_line;
if ($len_total > $ipod_max)
{
$part++;
$outfile =~ s/[0-9]{3}$/$part/;
print OUTFILE $content;
close OUTFILE;
$content = "";
open(OUTFILE, ">$outfile");
}
if ($line =~ /^$/)
{
$content .= "<b><p>$line";
} else {
$content .= "$line";
}
($chapter) = $content =~ /(CHAPTER \S+)/;
}
$count++;
print "Writing to $outfile...\n";
print OUTFILE $content;
print "done.\n";
close OUTFILE;
$content = "";
}


Running the script on Huck Finn from the Gutenberg Project looks like this:
earnoth@twinstar[07:32 PM|508]$ bin/splice_book.pl books/huckfinn.txt 
Slurped 597298
Found 43 chapters.
Working on chapter 000, found 72 lines (1748 bytes)
Writing to books/huckfinn000.001...
done.
Working on chapter 001, found 116 lines (7207 bytes)
Writing to books/huckfinn001.002...
done.
Working on chapter 002, found 234 lines (12337 bytes)
Writing to books/huckfinn002.004...
done.
<SNIP>
Working on chapter 043, found 433 lines (22487 bytes)
Writing to books/huckfinn043.006...
done.


The result is 172 text files, all less than 4012 characters long (avg 3400). A nice fit on the iPod.


No comments: