11.2.07: LogFS runs on the OLPC. It will die a horrible death upon encountering the first bad block, but that should not take much to solve. For the curious, here is the patch to play with it.

19.2.07: Yet again I've had to deal with the clone problem. Nasty stuff, but this time I have a solution that I don't consider to be a hack.

22.2.07: Well, the clone idea was interesting but useless. It turned out that it is impossible to do Garbage Collection in a filesystem with Linux at the moment. Fixing the problem required some changes to the way inode->i_state was handled, plus a few more details. *Fingers crossed*

26.2.07: Inodes now occupy a full block on the filesystem. Deleted inodes don't occupy any space.

3.5.07: Compression works. Getting it there was far from easy and I still suspect some bugs in the accounting when old data is overwritten with new data (that compresses to a different size).
An very interesting effect is that object validation can be _very_ slow. Having to uncompress an indirect block for every single pointer in it can be expensive. Maybe I should disable compression of indirect blocks for now.

21.5.07: Patches were sent to lkml for review. As an unexpected side-effect the press smelled a story and started writing about LogFS. Maybe a bit too early as I wouldn't call the code production-ready yet. Close but not ready.

4.6.07: Block device support seems to work. I simply did what block2mtd would have done. Not going through MTD lifts the 4GiB limit. Testers welcome.

9.7.07: Did a last format change. Any existing users that upgrade need to recreate their filesystems with the newly-creates mklogfs tool.
I also started using LogFS on a USB-stick. Well, I tried. The performance is absolutely horrible, between 10x and 100x slower than ext3. In a way this was expected. One of the tricks LogFS performs is to write to several blocks alternately, which works quite well on raw flash. A USB-stick tries to hide its flashieness by means of a Flash Translation Layer (FTL). And most FTL operate in such a way to cause _horrible_ performance for the LogFS access pattern, even though it is quite reasonable on raw flash.
Another reason for LogFS to be slow is that it updates its tree too soon. It should lazily wait for more updates to come in and write those changes as late as possible. That would give a 2x speedup for files between 80KiB and 1GiB and a 3x speedup for files beyond 1GiB.
Some other reasons may exist as well. Having such a slow filesystem is nothing to be proud of. Expect me to improve things in the future.

10.11.07:
LogFS is going through a phase of instability. Indirect blocks are kept in the page cache now, which does help performance, particularly read performance. Cache behaviour is still write-through, so write performance is only slightly improved, by about 50%-100% depending on the CPU speed. Next week will see some more testing before a new patch is officially released. If anyone dares, patch 698 appears to be fairly stable.

14.11.07:
Patch 710 seems to be the best patch yet. Before adding indirect block caching I had one known bug and decided to ignore it until after the caching was done. During work I noticed a race and thereby fixed the bug. A rare example where writing new code fixes bugs. Also new is a mapping inode for mtd, giving caching on a device level. Useful when neighboring objects get read on nand flash. Another advantage will be to remove the last remaining mutex for reads.

18.11.07:
Looks like there will be at least one more incompatible format change. Logfs can run out of space if the system crashes during writeout N times, with N possibly being as low as 1. There is no way to fix this while staying 100% compatible.
While at this, I call out for testers. The format should be finalized as soon as possible. If there are any remaining problems, please report them _now_. My goal is to push for inclusion into 2.6.25 and I'd like to avoid any embarrassment soon after release.

04.01.08:
Almost a month ago I implemented write-back caching for indirect blocks. The performance improvement was quite impressive. Even better, most writes now go to the same segment, so performance on consumer flashes (USB sticks, etc.) should become decent in spite of the bloody FTL on them.
However I keep finding bugs with this code. In particular, I noticed yesterday that my current strategy to deal with zombie blocks simply didn't work. And now I get to introduce quite a bit of accounting code. So what is this all about?
Zombie blocks are blocks for which a new version has been written elsewhere and that should be obsolete. But they are only obsolete when the new block actually becomes valid, which requires all nodes higher up in the tree to be written as well. All indirect blocks, the inode, any indirect blocks for the inode file, the master inode, the journal commit entry.
Write-back caching is efficient because those higher nodes don't get written yet - they remain in memory and just get marked as dirty. The hope is that future writes will cause further changes to those nodes, all of which can get written back once, instead of doing 500+ individual writes. Quite often this actually works. Only downside is that now we have to deal with zombie blocks.
Until now, logfs didn't try to identify zombie blocks at all. Instead it just asked whether the block could possibly be a zombie and would treat it as such, if answered yes. As a result, quite a number of well-behaved corpses were mixed up with the living and the living dead. And these otherwise harmless dead could wreak heavoc simply by occopying space.
When the filesystem gets full, the safely available space can be occupied by either live blocks, zombies or dead. If all dead get misaccounted as zombies, the filesystem can effectively become overfull and all those coffins will block the garbage collection machinery.
One might add that exactly this problem appears to have occured even before caching indirect blocks, because inodes have been cached for years already. With sufficient effort it was possible to reproduce this problem before and it is dead-easy to reproduce it now.

news (last edited 2008-03-05 11:22:58 by joern)