Making a PDF grayscale with ghostscript

A request from a friend made me face the problem of converting a color PDF into a grayscale one. Searching the web provided some ways of doing so with Adobe Acrobat, via some obscure menu item somewhere.

However, the very same operation could be undertaken with free tools, such as ghostscript. I found a way to do it in the YANUB blog, and I will copy-paste it here, with a small modification.

Assuming we have a file called color.pdf, and we want to convert it into grayscale.pdf, we could run the following command (all in a single line, and omitting the "\" line continuation marks):

% gs -sOutputFile=grayscale.pdf -sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH color.pdf

I prefer the above to YANUB's version below (in red what he lacks, in blue what I lack), because a shell operation is substituted by some option(s) of the command we are running:

% gs -sOutputFile=grayscale.pdf -sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH color.pdf < /dev/null

A sample Perl script to alleviate the tedious writing above:

#!/usr/bin/perl -w
use strict;
my $infile = $ARGV[0];
my $outfile = $infile;
$outfile =~ s/\.pdf$//;
$outfile = $outfile."_gray.pdf";
system "gs -sOutputFile=$outfile -sDEVICE=pdfwrite -sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH $infile"

Assuming we call the Perl script "togray.pl", and that we have a color file "input.pdf", we could just issue the command:

% togray.pl input.pdf

and we would get a grayscale version of it, named "input_gray.pdf".

My backups with rsync

In previous posts I have introduced the use of rsync for making incremental backups, and then mentioned an event of making use of such backups. However, I have realized that I haven't actually explained my backup scheme! Let's go for it:

Backup plan

I make a backup of my $home directory, say /home/isilanes. Each "backup" will be a set of 18 directories:

  • Current (last day)
  • 7 daily
  • 4 weekly
  • 6 monthly

Each such dir has an apparent complete copy of how /home/isilanes looked like at the moment of making the backup. However, making use of hard links, only the new bits of info are actually written. All the parts that are redundant are written once on disk, and then linked from all the places referring to it.

Result: a 18 copies of a $home of 3.8 GB in a total of 8.7 GB (14% of the apparent size of 63 GB, and 13% of 18x the info size, 68,4 GB).

Perl script for making the backup

Update (Jun 5, 2008): You can find a much refined version of the script here. It no longer requires certain auxiliary script to be installed in the remote machine, and is "better" in general (or it should be!)

Below is the commented Perl script I use. Machine names, directories and IPs are invented. Bart is the name of my computer.


#!/usr/bin/perl -w

use strict;

my $rsync = "rsync -a -e ssh --delete --delete-excluded";
my $home = "/home/isilanes";
my $logfile = "$home/.LOGs/backup_log";

#
# $where -> where to make the backup
#
# $often -> whether this is a daily, weekly or monthly backup
#
my $where = $ARGV[0] || 'none';
my $often = $ARGV[1] || 'none';

my ($source,$remote,$destdir,$excluded,$to,$from);

# Possible "$where"s:
my @wheres = qw /machine1 machine2/;

# Possible "$often"s:
my @oftens = qw /daily weekly monthly/;

# Check remote machine:
my $pass = 0;
foreach my $w (@whats) { $pass = 1 if ($what eq $w) };
die "$what is an incorrect option for \"what\"!\n" unless $pass;

# Check how-often:
$pass = 0;
foreach my $o (@oftens) { $pass = 1 if ($often eq $o) };
die "$often is an incorrect option for \"often\"!\n" unless $pass;

# Set variables:
if ($what eq 'machine1')
{
# Defaults:
$source = $home;
$remote = '0.0.0.1';
$destdir = '/disk2/backup/isilanes/bart.home.current';
$excluded = "--exclude-from $home/.LOGs/excludes_backup.dat";
$to = 'machine1';
$from = 'bart';
}
elsif ($what eq 'machine2')
{
# Defaults:
$source = $home;
$remote = '0.0.0.2';
$destdir = '/scratch/backup/isilanes/bart.home.current';
$excluded = "--exclude-from $home/.LOGs/excludes_backup.dat";
$to = 'machine2';
$from = 'bart';
}

# Do the job:
unless ($what eq 'none')
{
unless ($often eq 'none')
{
# Connect to the remote machine, and run ANOTHER script there, making a rotation
# of the backup dirs:
system "ssh $remote \"/home/isilanes/MyTools/rotate_backups.pl $often\"";

# Actually make the backup:
system "$rsync $excluded $source/ $remote:$destdir/";

# "touch" the backup dir, to give it present timestamp:
system "ssh $remote \"touch $destdir\"";

# Enter a line in the log file defined above ($logfile):
&writelog($from,$often,$to);
};
};

sub writelog
{
my $from = ucfirst($_[0]);
my $often = $_[1];
my $to = uc($_[2]);
my $date = `date`;

open(LOG,">>$logfile");
printf LOG "home@%-10s %-7s backup at %-10s on %1s",$from,$often,$to,$date;
close(LOG);
};

As can be seen, this script relies on the remote machine having a rotate_backups.pl Perl script, located at /home/isilanes/MyTools/. That script makes the rotation of the 18 backups (moving current to yesterday, yesterday to 2-days-ago, 2-days-ago to 3-days-ago and so on). The code for that:


#!/usr/bin/perl -w

use strict;

# Whether daily, weekly or monthly:
my $type = $ARGV[0] || 'daily';

# Backup directory:
my $bdir = '/disk4/backup/isilanes/bart.home';

# Max number of copies:
my %nmax = ( 'daily' => 7,
'weekly' => 4,
'monthly' => 6 );

# Choose one of the above:
my $nmax = $nmax{$type} || 7;

# Rotate N->tmp, N-1->N, ..., 1->2, current->1:
system "mv $bdir.$type.$nmax $bdir.tmp" if (-d "$bdir.$type.$nmax");

my $i;
for ($i=$nmax-1;$i>0;$i--)
{
my $j = $i+1;
system "mv $bdir.$type.$i $bdir.$type.$j" if (-d "$bdir.$type.$i");
};

system "mv $bdir.current $bdir.$type.1" if (-d "$bdir.current");

# Restore last (tmp) backup, and then refresh it:
system "mv $bdir.tmp $bdir.current" if (-d "$bdir.tmp");
system "cp -alf --reply=yes $bdir.$type.1/. $bdir.current/" if (-d "$bdir.$type.1");

Editing Wikipedia with mvs

I am currently doing some link disambiguation work for the Wikipedia, and as such, I have to find and replace the same strings many times, in many articles. The on-line Wikipedia edition is in general fine, but one would love to be able to use vim, for a task such as the one I'm taking. To do so, one can make use of mvs.

The mvs program allows us to download a Wikipedia article, save it as a file, then upload it again, after manipulating the file the way we want.

To log in to our Wikipedia account:

mvs login -d wikipedia.org -u username -p password

To download article "X" (beware the .wiki extension):

mvs update X.wiki

We can then edit X.wiki:

vim X.wiki

Then check it:

mvs preview X.wiki
firefox preview.html

And finally upload it:

mvs commit -m 'Your comment goes here' X.wiki

For more info, read the Wikipedia text editor support page

Dynamic file read with Perl

GNU/Linux command-line users, programmers and hackers worldwide have probably come to know and love the wonderful tail shell command, together with cat, head, grep, awk and sed, easily one of the single most usefull commands.

A killer feature of tail is the -f (--follow) argument, which outputs the last lines of a file and then keeps waiting for new lines that might keep appearing in the file, and show them on the screen when they do. This is invaluable to keep track of, e.g., logfiles where new entries are being added all the time, and one does not want to be doing a tail by hand.

Since I am a great fan of Perl, and use its scripts for anything short of cooking dinner (but wait...), I have found myself in situations where I had to tail the last lines of a file. This can be done in several ways:

system "tail $file";

or

my $str = `tail $file`;
print $str;

or with a open() statement, then reading the whole file (or a part), and printing it. The first example with system is the most "direct" one, but reading the file (or a part) into a variable is very handy for doing with it all the nifty things Perl does so well to text strings (substituting, deleting, including, reordering, comparing...).

However, when tail -f was needed (i.e., keep on tracking the file and operate on the output as it appears), I kept using system calls, and all the formatting had to be done in the shell spawned by the system call, not by Perl. This was sad.

So, I was so happy when I discovered a simple trick to make open() read dynamically. There are better ways of doing it, more efficiently and correctly, but this one works, and is quite simple. If efficience is vital for you, this is not probably the place to learn about it. Actually, if you look for efficiency, you shouldn't be using Perl at all :^)

Example of Perl code that reads dynamically a file "$in":

open(INFILE,"tail -0f $in |") || die "Failed!\n";
while(my $line = <INFILE>)
{ 
  do whatever to $line;
};
close(INFILE)

Update: Explanation to the code above:

The open() call pipes the output of the tail command (notice the -f flag. Do a man tail to know more) to the file tag "INFILE". The "||" sign is an OR, and means "do the thing on my right side if the thing on my left didn't end successfully (but ONLY in that case!)".

Next, we perform a while loop over the lines in the pipe. The "<INLINE>" construct extracts elements in INLINE, treating it as an array. As you can see, these elements are assigned to a new variable $line, and the loop continues while $line has some non-false value, i.e. while there are lines in INFILE.

The paragraph inside the curled keys is pseudocode, obviously; you put there your code. And, for tidiness, once we exit the loop, and INFILE is exhausted of lines, we close it.