Hardware compatibility is better with Windows… not

One of the (few, but legitimate) reasons given by some Windows users to not switch to Linux is that many pieces of hardware are not recognized by the latter. Sure enough, 99.9%, if not all, of the devices sold in shops are “Windows compatible”. The manufacturers of devices make damn sure their device, be it a pendrive or a printer, a computer screen or a keyboard, will work on any PC running Windows. They will even ship a CD with the drivers in the same package, so that installation of the device is as smooth as possible in Microsoft’s platform. Linux compatibility? Well, they usually just don’t care. Those hackers will make it work anyway, so why bother? And their market share is too small to take them into account.

Now, let’s pass to some personal experience with a webcam. I bought a webcam for my girlfriend’s laptop, which doesn’t have one integrated. The webcam was a cheap Logitech USB one, with “Designed for Skype” and “Windows compatible” written all around on the box. It even came with a CD, marked prominently as “Windows drivers”. My girlfriend’s laptop runs Windows Vista, and I decided to give it a chance, and plugged the webcam without further consideration. A message from our beloved OS informed me that a new device had been plugged (brilliant!) but Windows lacked the necessary drivers to make it work (bummer!). OK, no problem. We had the drivers, right? I unplugged the camera, inserted the CD, and followed the instructions to get the drivers installed. Everything went fine, except that the progress bar with the installation percent went on for more than 12 minutes (checked on the watch) before reaching 100%. After installation, Windows informed me that a system reboot was necessary, and so I did. After reboot, the camera would work.

As I had my Asus Eee at hand, I decided to try the webcam on it. I plugged it, and nothing happened. I just saw the green light on the camera turn on. Well, maybe it worked… I opened Cheese, a Linux program to show the output of webcams. I was a bit wary, because the Eee has an integrated webcam, so maybe there would be some interference or something. Not so. Cheese showed me immediately the output of the webcam I had just plugged, and offered me a menu with two entries (USB webcam and integrated one), so I could choose. That’s it. No CD with drivers, no 12-minute installation, no reboot, no nothing. Just plug and play.

Perhaps it is worth mentioning that the next time I tried to use the webcam on the Vista laptop, it would ask me for driver installation again! I don’t know why… I must have done something wrong in the first installation… With Windows, who knows?

Comments (10)

ChopZip: a parallel implementation of arbitrary compression algorithms

Remember plzma.py? I made a wrapper script for running [[LZMA]] in parallel. The script could be readily generalized to use any compression algorithm, following the principle of breaking the file in parts (one per CPU), compressing the parts, then [[tar (file format)|tarring]] them together. In other words, chop the file, zip the parts. Hence the name of the program that evolved from plzma.py: ChopZip.

Introduction

Currently ChopZip supports [[LZMA|lzma]], [[XZ Utils|xz]], [[gzip]] and lzip. Of them, lzip deserves a brief comment. It was brought to my attention by the a reader of this blog. It is based on the LZMA algorithm, as are lzma and xz. Apparently unlike them, multiple files compressed with lzip can be concatenated to form a single valid lzip-compressed file. Uncompressing the latter generates a concatenation of the formers.

To illustrate the point, check the following shell action:

% echo hello > head
% echo bye > tail
% lzip head
% lzip tail
% cat head.lz tail.lz > all.lz
% lzip -d all.lz
% cat all
hello
bye

However, I just discovered that all gzip, bzip2 and xz do that already! It seems that lzma is advertised as capable of doing it, but it doesn’t work for me. Sometimes it will uncompress the concatenated file to the original file just fine, others it will decompress it to just the first chunk of the set, yet other times it will complain that the “data is corrupt” and refuse to uncompress. For that reason, chopzip will accept two working modes: simple concatenation (gzip, lzip, xz) and tarring (lzma). The relevant mode will be used transparently for the user.

Also, if you use Ubuntu, this bug will apply to you, making it impossible to have xz-utils, lzma and lzip installed at the same time.

The really nice thing about concatenability is that it allows for trivial parallelization of the compression, while maintaining compatibility with the serial compression tool, which can still uncompress the product of a parallel compression. Unfortunatelly, for non-concatenatable compression formats, the output of chopzip will be a tar file of the compressed chunks, making it imposible to uncompress with the original compressor alone (first an untar would be needed, then uncompressing, then concatenation of chunks. Or just use chopzip to decompress).

The rationale behind plzma/chopzip is simple: multi-core computers are commonplace nowadays, but still the most common compression programs do not take advantage of this fact. At least the ones that I know and use don’t. There are at least two initiatives that tackle the issue, but I still think ChopZip has a niche to exploit. The most consolidated one is pbzip2 (which I mention in my plzma post). pbzip2 is great, if you want to use bzip2. It scales really nicely (almost linearly), and pbzipped files are valid bzip2 files. The main drawback is that it uses bzip2 as compression method. bzip2 has always been the “extreme” bother of gzip: compresses more, but it’s so slow that you would only resort to it if compression size is vital. LZMA-based programs (lzma, xz, lzip) are both faster, and even compress more, so for me bzip2 is out of the equation.

A second contender in parallel compression is pxz. As its name suggests, it compresses in using xz. Drawbacks? it’s not in the official repositories yet, and I couldn’t manage to compile it, even if it comprises a single C file, and a Makefile. It also lacks ability to use different encoders (which is not necessarily bad), and it’s a compiled program, versus chopzip, which is a much more portable script.

Scalability benchmark

Anyway, let’s get into chopzip. I have run a simple test with a moderately large file (a 374MB tar file of the whole /usr/bin dir). A table follows with the speedup results for running chopzip on that file, using various numbers of chunks (and consequently, threads). The tests were conducted in a 4GB RAM Intel Core 2 Quad Q8200 computer. Speedups are calculated as how many times faster did #chunks perform with respect to just 1 chunk. It is noteworthy that in every case running chopzip with a single chunk is virtually identical in performance to running the orginal compressor directly. Also decompression times (not show) were identical, irrespective of number of chunks. ChopZip version vas r18.

#chunks xz gzip lzma lzip
1 1.000 1.000 1.000 1.000
2 1.862 1.771 1.907 1.906
4 3.265 1.910 3.262 3.430
8 3.321 1.680 3.247 3.373
16 3.248 1.764 3.312 3.451

Note how increasing the number of chunks beyond the amount of actual cores (4 in this case) can have a small benefit. This happens because N equal chunks of a file will not be compressed with equal speed, so the more chunks, the smaller overall effect of the slowest-compressing chunks.

Conclusion

ChopZip speeds up quite noticeably the compression of arbitrary files, and with arbitrary compressors. In the case of concatenatable compressors (see above), the resulting compressed file is an ordinary compressed file, apt to be decompressed with the regular compressor (xz, lzip, gzip), as well as with ChopZip. This makes ChopZip a valid alternative to them, with the parallelization advantage.

Comments (6)

LWD – December 2009

This is a continuation post for my Linux World Domination project, started in this May 2008 post. You can read the previous post in the series here.

In the following data T2D means “time to domination” (the expected time for Windows/Linux shares to cross, counting from the present date). DT2D means difference (increase/decrease) in T2D, with respect to last report. CLP means “current Linux Percent”, as given by last logged data, and DD means domination day (in YYYY-MM-DD format), and DCLP means “difference in CLP”, with respect to last logged data. I have dropped the “Confidence” column, for it gave little or no info.

Project T2D DT2D DD CLP DCLP
Einstein already crossed September 2009 51.35 +4.24
MalariaControl >10 years 11.95 -0.32
POEM 83.4 months 2016-10-08 11.52 +0.69
PrimeGrid >10 years 10.31 +0.46
Rosetta >10 years 8.60 +0.10
QMC >10 years 8.23 +0.15
SETI >10 years 8.07 +0.05
Spinhenge >10 years 4.37 +0.15

Except for the good news that Einstein@home has succumbed to the Linux hordes, the numbers (again) seem quite discouraging, but the data is what it is. All CLPs but MalariaControl have gone up (which goes down less than in previous report). The Linux tide seems unstoppable, however its forward speed is not necessarily high.

As promised, today I’m showing the plots for Rosetta@home, in next issue Spinhenge@home.

Number of hosts percent evolution for Rosetta@home (click to enlarge)

Accumulated credit percent evolution for Rosetta@home (click to enlarge)

Comments (1)

Trivial use of md5sum

I just made use of the [[md5sum]] command in a rather simple situation which could have been more troublesome to handle with other means. The following scenario highlights, IMHO, how [[command line]] greatly simplifies some tasks.

I have a file file.txt, and a collection of files file.txt.N, where N = 1, 2, 3… I know that the former is a copy of one of the latter, but I don’t know which. I could have run [[diff]] on all the possible matches, but I would have had to run it for every N until a match was found. However, md5sum comes to rescue. I can just run:

% md5sum file.txt*

And check which file.txt.N has a MD5 signature equal to file.txt, so that one would be the match. This solution is still a bit annoying, because I have to visually search matches for a long string. Not to worry! Unix is our friend once again. Recover the above command with a single press to the “up” arrow, then extend the command a tiny bit:

% md5sum file.txt* | sort

Now, since the MD5 signatures are sorted, the match for our file.txt (if there is any), will appear right after the line for file.txt.

I challenge the reader to accomplish the same task as readily, comfortably and successfully in Windows or Mac, or in Linux without the command line.

Comments (4)

First impressions with Arch Linux

I have been considering for some time trying some Linux distro that would be a little faster than [[Ubuntu (operating system)|Ubuntu]]. I made the switch from [[Debian]] to Ubuntu some time ago, and I must say that I am very pleased with it, despite it being a bit bloated and slow. Ubuntu is really user-friendly. This term is often despised among geeks, but it does have a huge value. Often times a distro will disguise poor dependency-handling, lack of package tuning and absence of wise defaults as not having “fallen” for user-friendliness and “allowing the user do whatever she feels like”.

However comfortable Ubuntu might be, my inner geek wanted to get his hands a little bit dirtier with configurations, and obtain a more responsive OS in return. And that’s where [[Arch Linux]] fits in. Arch Linux is regarded as one of the fastest Linux distros, at least among the ones based on [[executable|binary]] [[Software package (installation)|packages]], not [[source code]]. Is this fame deserved? Well, in my short experience, it seem to be.

First off, let us clarify what one means with a “faster” Linux distro. There are as I see it, broadly speaking, three things that can be faster or slower in the users’ interaction with a computer. The first one, and very often cited one, is the [[booting|boot]] (and shutdown) time. Any period of time between a user deciding to use the computer and being able to do so is wasted time (from the user’s point of view). Many computers stay on for long periods of time, but for a home user, short booting times are a must. A second speed-related item would be the startup time of applications. Booting would be a sub-section of this, if we consider the OS/[[kernel (computing)|kernel]] as an “app”, but I refer here to user apps such as an e-mail client or text editor. Granted, most start within seconds at most, many below one second or apparently “instantly”, but some others are renowned for their slugginess ([[OpenOffice.org]], [[Firefox]] and [[Amarok (software)|Amarok]] come to mind). Even the not-very-slow apps that take a few seconds can become irritating if used with some frequency. The third speed-related item would be the execution of long-running CPU-intensive software, such as audio/video coding or scientific computation.

Of the three issues mentioned, it should be made clear that the third one (execution of CPU-intensive tasks) is seldom affected at all by the “speed” of the OS. Or it shouldn’t be. Of course having the latest versions of the libraries used by the CPU-intensive software should make a difference, but I doubt that encoding a video with [[MEncoder]] is any faster in [[Gentoo Linux|Gentoo]] than Ubuntu (for the same version of mencoder and libraries). However, the first two (booting and start up of apps) are different from OS to OS.

Booting

I did some timings in Ubuntu and Arch, both in the same (dual boot) machine. I measured the time from [[GRUB]] to [[GNOME Display Manager|GDM]], and then the time from GDM to a working [[desktop environment]] ([[GNOME]] in both). The exact data might not be that meaningful, as some details could be different from one installation to the other (different choice of firewall, or (minimally) different autostarted apps in the DE). But the big numbers are significant: where Ubuntu takes slightly below 1 minute to GDM, and around half a minute to GNOME, Arch takes below 20 seconds and 10 seconds, respectively.

App start up

Of the three applications mentioned, OpenOffice.org and Firefox start faster in Arch than in Ubuntu. I wrote down the numbers, but don’t have them now. Amarok, on the other hand, took equally long to start (some infamous 35 seconds) in both OSs. It is worth mentioning that all of them start up faster the second and successive times, and that the Ubuntu/Arch differences between second starts is correspondingly smaller (because both are fast). Still Arch is a bit faster (except for Amarok).

ABS, or custom compilation

But the benefits of Arch don’t end in a faster boot, or a more responsive desktop (which it has). Arch Linux makes it really easy to compile and install any custom package the user wants, and I decided to take advantage of it. With Debian/Ubuntu, you can download the source code of a package quite easily, but the compilation is more or less left to you, and the installation is different from that of a “official” package. With Arch, generating a package from the source is quite easy, and then installing it with [[Pacman (package manager)|Pacman]] is trivial. For more info, refer to the Arch wiki entry for ABS.

I first compiled MEncoder (inside the mplayer package), and found out that the compiled version made no difference with respect to the stock binary package. I should have known that, because I say so in this very post, don’t I? However, one always thinks that he can compile a package “better”, so I tried it (and failed to get any improvement).

On the other hand, when I recompiled Amarok, I did get a huge boost in speed. A simple custom compilation produced an Amarok that took only 15 seconds to start up, less than half of the vanilla binary distributed with Arch (I measured the 15 seconds even after rebooting, which rules out any “second time is faster” effect).

Is it hard to use?

Leaving the speed issue aside, one of the possible drawbacks of a geekier Linux distro is that it could be harder to use. Arch is, indeed, but not much. A seasoned Linux user should hardly find any difficulty to install and configure Arch. It is certainly not for beginners, but it is not super-hard either.

One of the few gripes I have with it regards the installation of a graphical environment. As it turns out, installing a DE such as GNOME does not trigger the installation of any [[X Window System]], such as [[X.org Server]], as dependencies are set only for really vital things. Well, that’s not too bad, Arch is not assuming I want something until I tell it I do. Fine. Then, when I do install Xorg, the tools for configuring it are a bit lacking. I am so spoiled by the automagic configurations in Ubuntu, where you are presented a full-fledged desktop with almost no decision on your side, that I miss a magic script that will make X “just work”. Anyway, I can live with that. But some thing that made me feel like giving up was that after following all the instruction in the really nice Arch Wiki, I was unable to start X (it would start as a black screen, then freeze, and I could only get out by rebooting the computer). The problem was that I have a [[Nvidia]] graphics card, and I needed the (proprietary) drivers. OK, of course I need them, but the default vesa driver should work as well!! In Ubuntu one can get a lower resolution, non-3D effect, desktop with the default vesa driver. Then the proprietary Nvidia drivers allow for more eye-candy and fanciness. But not in Arch. When I decided to skip the test with vesa, and download the proprietary drivers, the X server started without any problem.

Conclusions

I am quite happy with Arch so far. Yes, one has to work around some rough edges, but it is a nice experience as well, because one learns more than with other too user-friendly distros. I think Arch Linux is a very nice distro that is worth using, and I recommend it to any Linux user willing to learn and “get hands dirty”.

Comments (3)

No market for Linux games? The Koonsolo case

I’ve read via Phoronix the case of the indie PC game producer Koonsolo, which sells a game for both Windows, Mac and Linux. The interesting thing is that, as you can read on Koonsolo’s blog, the Linux version is being sold in larger numbers than the Windows one!

Apparently a 40% of the visitors or the Koonsolo site use Windows, vs less than 23% for Linux. However, despite the majority of visitors using Windows (there are even more Mac visitors than Linux ones), the Linux version sales amount to a 34% of the total sales, whereas Windows sales are only 23%. Visit the site for some more numbers and comments.

Comments

LDW – September 2009

This is a continuation post for my Linux World Domination project, started in this May 2008 post. You can read the previous post in the series here.

In the following data T2D means “time to domination” (the expected time for Windows/Linux shares to cross, counting from the present date). DT2D means difference (increase/decrease) in T2D, with respect to last report. CLP means “current Linux Percent”, as given by last logged data, and DD means domination day (in YYYY-MM-DD format).

Project T2D DT2D DD CLP Confidence %
Einstein 38.6 days -55 days 2009-10-10 47.11 (+2.60) 16.1
MalariaControl >10 years 12.27 (-0.37)
POEM >10 years 10.83 (+0.17)
PrimeGrid >10 years 9.85 (+0.24)
Rosetta >10 years 8.50 (+0.13)
QMC >10 years 8.07 (+0.15)
SETI >10 years 8.02 (+0.02)
Spinhenge >10 years 4.22 (+0.37)

The numbers (again) seem quite discouraging, but the data is what it is. All CLPs but MalariaControl have gone up, with Spinhenge going up by almost a 0.4% in 3 months. The Linux tide seems unstoppable, however its forward speed is not necessarily high.

As promised, today I’m showing the plots for QMC@home, in next issue Rosetta@home.

Number of hosts percent evolution for QMC@home (click to enlarge)

Accumulated credit percent evolution for QMC”home (click to enlarge)

Comments (1)

plzma.py: a wrapper for parallel implementation of LZMA compression

Update: this script has been superseded by ChopZip

Introduction

I discovered the [[Lempel-Ziv-Markov chain algorithm|LZMA]] compression algorithm some time ago, and have been thrilled by its capacity since. It has higher compression ratios than even [[bzip2]], with a faster decompression time. However, although decompressing is fast, compressing is not: LZMA is even slower than bzip2. On the other hand, [[gzip]] remains blazing fast in comparison, while providing a decent level of compression.

More recently I have discovered the interesting pbzip2, which is a parallel implementation of bzip2. With the increasing popularity of multi-core processors (I have a quad-core at home myself), parallelizing the compression tools is a very good idea. pbzip2 performs really well, producing bzip2-compatible files with near-linear scaling with the number of CPUs.

LZMA being such a high performance compressor, I wondered if its speed could be boosted by using it in parallel. Although the [[Lempel-Ziv-Markov chain algorithm|Wikipedia article]] states that the algorithm can be parallelized, I found no such implementation in Ubuntu 9.04, where the utility provided by the lzma package is exclusively serial. Not finding one, I set myself to produce it.

About plzma.py

Any compression can be parallelized as follows:

  1. Split the original file into as many pieces as CPU cores available
  2. Compress (simultaneously) all the pieces
  3. Create a single file by joining all the compressed pieces, and call the result “the compressed file”

In a Linux environment, these three tasks can be carried out easily by split, lzma itself, and tar, respectively. I just made a [[Python (programming language)|Python]] script to automate these tasks, called it plzma.py, and put it in my web site for anyone to download (it’s GPLed). Please notice that plzma.py has been superseded by chopzip, starting with revision 12, whereas latest plzma is revision 6.

I must remark that, while pbzip2 generates bzip2-compatible compressed files, that is not the case with plzma. The products of plzma compression must be decompressed with plzma as well. The actual format of a plzma file is just a TAR file containing as many LZMA-compressed chunks as CPUs used for compression. These chunks, once decompressed individually, can be concatenated (with the cat command) to form the original file.

Benchmarks

What review of compression tools lacks benchmarks? No matter how inaccurate or silly, none of them do. And neither does mine :^)

I used three (single) files as reference:

  • molekel.tar – a 108 MB tar file of the (GPL) [[Molekel]] 5.0 source code
  • usr.bin.tar – 309 MB tar file of the contens of my /usr/bin/ dir
  • hackable.tar – a 782 MB tar file of the hackable:1 [[Debian]]-based distro for the [[Neo FreeRunner]]

The second case is intended as an example of binary file compression, whereas the other two are more of a “real-life” example. I didn’t test text-only files… I might in the future, but don’t expect the conclusions to change much. The testbed was my Frink desktop PC (Intel Q8200 quad-core).

The options for each tool were:

  • gzip/bzip/pbzip2: compression level 6
  • lzma/plzma: compression level 3
  • pbzip2/plzma: 4 CPUs

Compressed size

The most important feature of a compressor is the size of the resulting file. After all, we used it in first place to save space. No matter how fast an algorithm is, if the resulting file is bigger than the original file I wouldn’t use it. Would you?

The graph below shows the compressed size ratio for compression of the three test files with each of the five tools considered. The compressed size ratio is defined as the compressed size divided by the original size for each file.

This test doesn’t surprise much: gzip is the least effective and LZMA the most one. The point to make here is that the parallel implementations perform as well or badly as their serial counterparts.

If you are unimpressed by the supposedly higher performance of bzip2 and LZMA over gzip, when in the picture all final sizes do not look very different, recall that gzip compressed molekel.tar ~ 3 times (to a 0.329 ratio), whereas LZMA compressed it ~ 4.3 times (to a 0.233 ratio). You could stuff 13 LZMAed files where only 9 gzipped ones fit (and just 3 uncompressed ones).

Compression time

However important the compressed size is, compression time is also an important subject. Actually, that’s the very issue I try to address parallelizing LZMA: to make it faster while keeping its high compression ratio.

The graph below shows the normalized times for compression of the three test files with each of the five tools considered. The normalized time is taken as the total time divided by the time it took gzip to finish (an arbitrary scale with t(gzip)=1.0).

Roughly speaking, we could say that in my setting pbzip2 makes bzip2 as fast as gzip, and plzma makes LZMA as fast as serial bzip2.

The speedups for bzip2/pbzip2 and LZMA/plzma are given in the following table:

File pbzip2 plzma
molekel.tar 4.00 2.72
usr.bin.tar 3.61 3.38
hackable.tar 3.80 3.04

The performance of plzma is nowere near pbzip2, but I’d call it acceptable (wouldn’t I?, I’m the author!). There are two reasons I can think of to explain lower-than-linear scalability. The first one is the overhead imposed when cutting the file into pieces then assembling them back. The second one, maybe more important, is the disk performance. Maybe each core can compress each file independently, but the disk I/O for reading the chunks and writing them back compressed is done simultaneously on the same disk, which the four processes share.

Update: I think that a good deal of under-linearity comes from the fact that files of equal size will not be compressed in an equal time. Each chunk compression will take a slightly different time to complete, because some will be easier than others to compress. The program waits for the last compression to finish, so it’s as slow as the slowest one. It is also true that pieces of 1/N size might take more than 1/N time to complete, so the more chunks, the slower the compression in total (the opposite could also be true, though).

Decompression times

Usually we pay less attention to it, because it is much faster (and because we often compress things never to open them again, in which case we had better deleted them in first place… but I digress).

The following graph shows the decompression data equivalent to the compression times graph above.

The most noteworthy point is that pbzip2 decompresses pbzip2-compressed files faster than bzip2 does with bzip2-compressed files. That is, both compression and decompression benefit from the parallelization. However, for plzma that is not the case: decompression is slower than with the serial LZMA. This is due to two effects: first, the decompression part is still not parallelized in my script (it will soon be). This would lead to decompression speeds near to the serial LZMA. However, it is slower due to the second effect: the overhead caused by splitting and then joining.

Another result worth noting is that, although LZMA is much slower than even bzip2 to compress, the decompression is actually faster. This is not random. LZMA was designed with fast uncompression time in mind, so that it could be used in, e.g. software distribution, where a single person compresses the original data (however painstakingly), then the users can download the result (the smaller, the faster), and uncompress it to use it.

Conclusions

While there is room for improvement, plzma seems like a viable option to speed up general compression tasks where a high compression ratio (LZMA level) is desired.

I would like to stress the point that plzma files are not uncompressable with just LZMA. If you don’t use plzma to decompress, you can follow the these steps:

% tar -xf file.plz
% lzma -d file.0[1-4].lz
% cat file.0[1-4] > file
% rm file.0[1-4] file.plz

Comments (4)

Accessing Linux ext2/ext3 partitions from MS Windows

Accessing both Windows [[File Allocation Table|FAT]] and [[NTFS]] file systems from Linux is quite easy, with tools like [[NTFS-3G]]. However (following with the [[shit|MS]] tradition of making itself incompatible with everything else, to thwart competition), doing the opposite (accessing Linux file systems from Windows) is more complicated. One would have to guess why (and how!) [[closed source software|closed]] and [[proprietary software|proprietary]] and technically inferior file systems can be read by free software tools, whereas proprietary software with such a big corporation behind is incapable (or unwilling) to interact with superior and [[free software]] file systems. Why should Windows users be deprived of the choice over [[JFS (file system)|JFS]], [[XFS]] or [[ReiserFS]], when they are free? MS techs are too dumb to implement them? Or too evil to give their users the choice? Or, maybe, too scared that if choice is possible, their users will dump NTFS? Neither explanation makes one feel much love for MS, does it?

This stupid inability of Windows to read any of the many formats Linux can use gives rise to problems for not only Windows users, but also Linux users. For example, when I format my external hard disks or pendrives, I end up wondering if I should reserve some space for a FAT partition, so I could put there data to share with hypothetical Windows users I could lend the disk to. And, seriously, I abhor wasting my hardware with such lousy file systems, when I could use Linux ones.

Anyway, there are some third-party tools to help us which such a task. I found at least two:

I have used the first one, but as some blogs point out (e.g. BloggUccio), ext2fsd is required if the [[inode]] size is bigger than 128 B (256 B in some modern Linux distros).

Getting Ext2IFS

It is a simple exe file you can download from fs-driver.org. Installing it consists on the typical windows next-next-finish click-dance. In principle the defaults are OK. It will ask you about activating “read-only” (which I declined. It’s less safe, but I would like to be able to write too), and something about large file support (which I accepted, because it’s only an issue with Linux kernels older than 2.2… Middle Age stuff).

Formatting the hard drive

In principle, Ext2IFS can read ext2/ext3 partitions with no problem. In practice, if the partition was created with an [[inode]] size of more than 128 bytes, Ext2IFS won’t read it. To create a “compatible” partition, you can mkfs it with the -I flag, as follows:

# mkfs.ext3 -I 128 /dev/whatever

I found out about the 128 B inode thing from this forum thread [es].

Practical use

What I have done, and tested, is what follows: I format my external drives with almost all of it as ext3, as described, leaving a couple of gigabytes (you could cut down to a couple of megabytes if you really want to) for a FAT partition. Then copy the Ext2IFS_1_11a.exe executable to that partition.

Whenever you want to use that drive, Linux will see two partitions (the ext3 and the FAT one), the second one of which you can ignore. From Windows, you will see only a 2GB FAT partition. However, you will be able to open it, find the exe, double-click, and install Ext2IFS. After that, you can unplug the drive and plug it again…et voilà, you will see the ext3 partition just fine.

Comments (2)

Ubuntu error: the installer needs to remove operating system files

I started installing [[Ubuntu Netbook Remix]] 9.04 in my [[ASUS Eee PC]], and after the partitioning step, I stumbled upon the following error:

The installer needs to remove operating system files from the install target, but was unable to do so. The install cannot continue

I was installing Ubuntu on top of a previous eeebuntu install, smashing the / partition, while reusing the /home. After minimal googling, I found this bug report at Launchpad, with the same problem (and one year old).

As it turns out, the problem was not with the root partition, as I assumed from the error message, but with the home one. Apparently, Ubuntu didn’t like the idea that my home partition was [[JFS (file system)|JFS]] (maybe it couldn’t mount it, because jfs_utils are not loaded by default). The solution: install the OS ignoring (not using) the home partition, and mount it afterwards.

Shame on you, Ubuntu, this solution is lame!

Comments (4)

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »