The git-annex assistant is being crowd funded on Kickstarter. I'll be blogging about my progress here on a semi-daily basis.

More bugfixes today. The assistant now seems to have enough users that they're turning up interesting bugs, which is good. But does keep me too busy to add many more bugsWcode.

The fun one today made it bloat to eat all memory when logging out of a Linux desktop. I tracked that back to a bug in the Haskell DBUS library when a session connection is open and the session goes away. Developed a test case, and even profiled it, and sent it all of to the library's author. Hopefully there will be a quick fix, in the meantime today's release has DBUS turned off. Which is ok, it just makes it a little bit slower to notice some events.

Posted Tue Oct 16 20:48:55 2012

I was mostly working on other things today, but I did do some bug fixing. The worst of these is a bug introduced in 3.20121009 that breaks git-annex-shell configlist. That's pretty bad for using git-annex on servers, although you mostly won't notice unless you're just getting started using a ssh remote, since that's when it calls configlist. I will be releasing a new version as soon as I have bandwidth (tomorrow).

Also made the standalone Linux and OSX binaries build with ssh connection caching disabled, since they don't bundle their own ssh and need to work with whatever ssh is installed.

Posted Tue Oct 16 04:20:03 2012

Fixed the assistant to wait on all the zombie processes that would sometimes pile up. I didn't realize this was as bad as it was.

Zombies and git-annex have been a problem since I started developing it, because back then I made some rather poor choices, due to barely knowing how to write Haskell. So parts of the code that stream input from git commands don't clean up after them properly. Not normally a problem, because git-annex reaps the zombies after each file it processes. But this reaping is not thread-safe; it cannot be used in the assistant.

If I were starting git-annex today, I'd use one of the new Haskell things like Conduits, that allow for very clean control over finalization of resources. But switching it to Conduits now would probably take weeks of work; I've not yet felt it was worthwhile. (Also it's not clear Conduits are the last, best thing.)

For now, it keeps track of the pids it needs to wait on, and all the code run by the assistant is zombie-free. However, some code for fsck and unused that I anticipate the assistant using eventually still has some lurking zombies.


Solved the issue with preferred content expressions and dropping that I mentioned yesterday. My solution was to add a parameter to specify a set of repositories where content should be assumed not to be present. When deciding whether to drop, it can put the current repository in, and then if the expression fails to match, the content can be dropped.

Using yesterday's example "(not copies=trusted:2) and (not in=usbdrive)", when the local repo is one of the 2 trusted copies, the drop check will see only 1 trusted copy, so the expression matches, and so the content will not be dropped.

I've not tested my solution, but it type checks. :P I'll wire it up to get/drop/move --auto tomorrow and see how it performs.


Would preferred content expressions be more readble if they were inverted (becoming content filtering expressions)?

  1. "(not copies=trusted:2) and (not in=usbdrive)" becomes "copies=trusted:2 or in=usbdrive"
  2. "smallerthan=10mb and include=.mp3 and exclude=junk/" becomes "largerthan=10mb or exclude=.mp3" or include=junk/"
  3. "(not group=archival) and (not copies=archival:1)" becomes "group=archival or copies=archival:1"

1 and 3 are improved, but 2, less so. It's a trifle weird for "include" to mean "include in excluded content".

The other reason not to do this is that currently the expressions can be fed into git annex find on the command line, and it'll come back with the files that would be kept.

Perhaps a middle groud is to make "dontwant" be an alias for "not". Then we can write "dontwant (copies=trusted:2 or in=usbdrive)"


A user told me this:

I can confirm that the assistant does what it is supposed to do really well. I just hooked up my notebook to the network and it starts syncing from notebook to fileserver and the assistant on the fileserver also immediately starts syncing to the [..] backup

That makes me happy, it's the first quite so real-world success report I've heard.

Posted Tue Oct 16 01:54:19 2012

Switched the OSX standalone app to use DYLD_ROOT_PATH. This is the third DYLD_* variable I've tried; neither of the other two worked in all situations. This one may do better. If not, I may be stuck modifying the library names in each executable using install_name_tool (good reference for doing that). As far as I know, every existing dynamic library lookup system is broken in some way other other; nothing I've seen about OSX's so far disproves that rule.

Fixed a nasty utf-8 encoding crash that could occur when merging the git-annex branch. I hope I'm almost done with those.

Made git-annex auto-detect when a git remote is on a sever like github that doesn't support git-annex, and automatically set annex-ignore.

Finished the UI for pausing syncing of a remote. Making the syncing actually stop still has some glitches to resolve.

Posted Tue Oct 16 01:54:19 2012

Preferred content control is wired up to --auto and working for get, copy, and drop. Note that drop --from remote --auto drops files that the remote's preferred content settings indicate it doesn't want; likewise copy --to remote --auto sends content that the remote does want.

Also implemented smallerthan, largerthan, and ingroup limits, which should be everything needed for the scenarios described in transfer control.

Dying to hook this up to the assistant, but a cloudy day is forcing me to curtail further computer use.


Also, last night I developed a patch for the hS3 library, that should let git-annex upload large files to S3 without buffering their whole content in memory. I have a s3-memory-leak in git-annex that uses the new API I developed. Hopefully hS3's maintainer will release a new version with that soon.

Posted Tue Oct 16 01:54:19 2012

Started implementing transfer control. Although I'm currently calling the configuration for it "preferred content expressions". (What a mouthful!)

I was mostly able to reuse the Limit code (used to handle parameters like --not --in otherrepo), so it can already build Matchers for preferred content expressions in my little Domain Specific Language.

Preferred content expressions can be edited with git annex vicfg, which checks that they parse properly.

The plan is that the first place to use them is not going to be inside the assistant, but in commands that use the --auto parameter, which will use them as an additional constraint, in addition to the numcopies setting already used. Once I get it working there, I'll add it to the assistant.

Let's say a repo has a preferred content setting of "(not copies=trusted:2) and (not in=usbdrive)"

  • git annex get --auto will get files that have less than 2 trusted copies, and are not in the usb drive.
  • git annex drop --auto will drop files that have 2 or more trusted copies, and are not in the usb drive (assuming numcopies allows dropping them of course).
  • git annex copy --auto --to thatrepo run from another repo will only copy files that have less than 2 trusted copies. (And if that was run on the usb drive, it'd never copy anything!)

There is a complication here.. What if the repo with that preferred content setting is itself trusted? Then when it gets a file, its number of trusted copies increases, which will make it be dropped again. :-/

This is a nuance that the numcopies code already deals with, but it's much harder to deal with it in these complicated expressions. I need to think about this; the three ideas I'm working on are:

  1. Leave it to whoever/whatever writes these expressions to write ones that avoid such problems. Which is ok if I'm the only one writing pre-canned ones, in practice..
  2. Transform expressions into ones that avoid such problems. (For example, replace "not copies=trusted:2" with "not (copies=trusted:2 or (in=here and trusted=here and copies=trusted:3))"
  3. Have some of the commands (mostly drop I think) pretend the drop has already happened, and check if it'd then want to get the file back again.
Posted Tue Oct 16 01:54:19 2012

Did a fair amount of testing and bug fixing today.

There is still some buggy behavior around pausing syncing to a remote, where transfers still happen to it, but I fixed the worst bug there.

Noticed that if a non-bare repo is set up on a removable drive, its file tree will not normally be updated as syncs come in -- because the assistant is not running on that repo, and so incoming syncs are not merged into the local master branch. For now I made it always use bare repos on removable drives, but I may want to revisit this.

The repository edit form now has a field for the name of the repo, so the ugly names that the assistant comes up with for ssh remotes can be edited as you like. git remote rename is a very nice thing.

Changed the preferred content expression for transfer repos to this: "not (inallgroup=client and copies=client:2)". This way, when there's just one client, files on it will be synced to transfer repos, even though those repos have no other clients to transfer them to. Presumably, if a transfer repo is set up, more clients are coming soon, so this avoids a wait. Particularly useful with removable drives, as the drive will start being filled as soon as it's added, and can then be brought to a client elsewhere. The "2" does mean that, once another client is found, the data on the transfer repo will be dropped, and so if it's brought to yet another new client, it won't have data for it right away. I can't see way to generalize this workaround to more than 2 clients; the transfer repo has to start dropping apparently unwanted content at some point. Still, this will avoid a potentially very confusing behavior when getting started.


I need to get that dropping of non-preferred content to happen still. Yesterday, I did some analysis of all the events that can cause previously preferred content to no longer be preferred, so I know all the places I have to deal with this.

The one that's giving me some trouble is checking in the transfer scan. If it checks for content to drop at the same time as content to transfer, it could end up doing a lot of transfers before dropping anything. It'd be nicer to first drop as much as it can, before getting more data, so that transfer remotes stay as small as possible. But the scan is expensive, and it'd also be nice not to need two passes.

Posted Tue Oct 16 01:54:19 2012

today

Came up with four groups of repositories that it makes sense to define standard preferred content expressions for.

[[!format haskell """ preferredContent :: StandardGroup -> String preferredContent ClientGroup = "exclude=/archive/" preferredContent TransferGroup = "not inallgroup=client and " ++ preferredContent ClientGroup preferredContent ArchiveGroup = "not copies=archive:1" preferredContent BackupGroup = "" -- all content is preferred """]]

preferred content has the details about these groups, but as I was writing those three preferred content expressions, I realized they are some of the highest level programming I've ever done, in a way.

Anyway, these make for a very simple repository configuration UI:

form with simple select box

yesterday (forgot to post this)

Got the assistant honoring preferred content settings. Although so far that only determines what it transfers. Additional work will be needed to make content be dropped when it stops being preferred.


Added a "configure" link next to each repository on the repository config page. This will go to a form to allow setting things like repository descriptions, groups, and preferred content settings.


Cut a release.

Posted Tue Oct 16 01:54:19 2012

Bugfixes all day.

The most amusing bug, which I just stumbled over randomly on my own, after someone on IRC yesterday was possibly encountering the same issue, made git annex webapp go into an infinite memory-consuming loop on startup if the repository it had been using was no longer a valid git repository.

Then there was the place where HOME got unset, with also sometimes amusing results.

Also fixed several build problems, including a threaded runtime hang in the test suite. Hopefully the next release will build on all Debian architectures again.

I'll be cutting that release tomorrow. I also updated the linux prebuilt tarballs today.


Hmm, not entirely bugfixes after all. Had time (and power) to work on the repository configuration form too, and added a check box to it that can be unchecked to disable syncing with a repository. Also, made that form be displayed after the webapp creates a new repository.

Posted Tue Oct 16 01:54:19 2012

Working toward getting the data syncing to happen robustly, so a bunch of improvements.

  • Got unmount events to be noticed, so unplugging and replugging a removable drive will resume the syncing to it. There's really no good unmount event available on dbus in kde, so it uses a heuristic there.
  • Avoid requeuing a download from a remote that no longer has a key.
  • Run a full scan on startup, for multiple reasons, including dealing with crashes.

Ran into a strange issue: Occasionally the assistant will run git-annex copy and it will not transfer the requested file. It seems that when the copy command runs git ls-files, it does not see the file it's supposed to act on in its output.

Eventually I figured out what's going on: When updating the git-annex branch, it sets GIT_INDEX_FILE, and of course environment settings are not thread-safe! So there's a race between threads that access the git-annex branch, and the Transferrer thread, or any other thread that might expect to look at the normal git index.

Unfortunatly, I don't have a fix for this yet.. Git's only interface for using a different index file is GIT_INDEX_FILE. It seems I have a lot of code to tear apart, to push back the setenv until after forking every git command. :(

Before I figured out the root problem, I developed a workaround for the symptom I was seeing. I added a git-annex transferkey, which is optimised to be run by the assistant, and avoids running git ls-files, so avoids the problem. While I plan to fix this environment variable problem properly, transferkey turns out to be so much faster than how it was using copy that I'm going to keep it.

Posted Wed Oct 10 15:36:01 2012

Managed to find a minimal, 20 line test case for at least one of the ways git-annex was hanging with GHC's threaded runtime. Sent it off to haskell-cafe for analysis. thread

Further managed to narrow the bug down to MissingH's use of logging code, that git-annex doesn't use. bug report. So, I can at least get around this problem with a modified version of MissingH. Hopefully that was the only thing causing the hangs I was seeing!

Posted Wed Oct 10 15:36:01 2012

Spent most of the day making file content transfers robust. There were lots of bugs, hopefully I've fixed most of them. It seems to work well now, even when I throw a lot of files at it.

One of the changes also sped up transfers; it no longer roundtrips to the remote to verify it has a file. The idea here is that when the assistant is running, repos should typically be fairly tightly synced to their remotes by it, so some of the extra checks that the move command does are unnecessary.

Also spent some time trying to use ghc's threaded runtime, but continue to be baffled by the random hangs when using it. This needs fixing eventually; all the assistant's threads can potentially be blocked when it's waiting on an external command it has run.

Also changed how transfer info files are locked. The lock file is now separate from the info file, which allows the TransferWatcher thread to notice when an info file is created, and thus actually track transfers initiated by remotes.


I'm fairly close now to merging the assistant branch into master. The data syncing code is very brute-force, but it will work well enough for a first cut.

Next I can either add some repository network mapping, and use graph analysis to reduce the number of data transfers, or I can move on to the webapp. Not sure yet which I'll do. It's likely that since DebConf begins tomorrow I'll put off either of those big things until after the conference.

Posted Wed Oct 10 15:36:01 2012

Short day today.

  • Worked on fixing a number of build failures people reported.
  • Solved the problem that was making transfer pause/resume not always work. Although there is another bug where pausing a transfer sometimes lets another queued transfer start running.
  • Worked on getting the assistant to start on login on OSX.
Posted Wed Oct 10 15:36:01 2012

A bit under the weather, but got into building buttons to control running and queued transfers today. The html and javascript side is done, with each transfer now having a cancel button, as well as a pause/start button.

Canceling queued transfers works. Canceling running transfers will need some more work, because killing a thread doesn't kill the processes being run by that thread. So I'll have to make the assistant run separate git-annex processes for transfers, that can be individually sent signals.

Posted Wed Oct 10 15:36:01 2012

First day of Kickstarter funded work!

Worked on inotify today. The watch branch in git now does a pretty good job of following changes made to the directory, annexing files as they're added and staging other changes into git. Here's a quick transcript of it in action:

joey@gnu:~/tmp>mkdir demo
joey@gnu:~/tmp>cd demo
joey@gnu:~/tmp/demo>git init
Initialized empty Git repository in /home/joey/tmp/demo/.git/
joey@gnu:~/tmp/demo>git annex init demo
init demo ok
(Recording state in git...)
joey@gnu:~/tmp/demo>git annex watch &
[1] 3284
watch . (scanning...) (started)
joey@gnu:~/tmp/demo>dd if=/dev/urandom of=bigfile bs=1M count=2
add ./bigfile 2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.835976 s, 2.5 MB/s
(checksum...) ok
(Recording state in git...)
joey@gnu:~/tmp/demo>ls -la bigfile
lrwxrwxrwx 1 joey joey 188 Jun  4 15:36 bigfile -> .git/annex/objects/Wx/KQ/SHA256-s2097152--e5ced5836a3f9be782e6da14446794a1d22d9694f5c85f3ad7220b035a4b82ee/SHA256-s2097152--e5ced5836a3f9be782e6da14446794a1d22d9694f5c85f3ad7220b035a4b82ee
joey@gnu:~/tmp/demo>git status -s
A  bigfile
joey@gnu:~/tmp/demo>mkdir foo
joey@gnu:~/tmp/demo>mv bigfile foo
"del ./bigfile"
joey@gnu:~/tmp/demo>git status -s
AD bigfile
A  foo/bigfile

Due to Linux's inotify interface, this is surely some of the most subtle, race-heavy code that I'll need to deal with while developing the git annex assistant. But I can't start wading, need to jump off the deep end to make progress!

The hardest problem today involved the case where a directory is moved outside of the tree that's being watched. Inotify will still send events for such directories, but it doesn't make sense to continue to handle them.

Ideally I'd stop inotify watching such directories, but a lot of state would need to be maintained to know which inotify handle to stop watching. (Seems like Haskell's inotify API makes this harder than it needs to be...)

Instead, I put in a hack that will make it detect inotify events from directories moved away, and ignore them. This is probably acceptable, since this is an unusual edge case.


The notable omission in the inotify code, which I'll work on next, is staging deleting of files. This is tricky because adding a file to the annex happens to cause a deletion event. I need to make sure there are no races where that deletion event causes data loss.

Posted Wed Oct 10 15:36:01 2012

Mostly took a break from working on the assistant today. Instead worked on adding incremental fsck to git-annex. Well, that will be something that assistant will use, eventually, probably.

Jimmy and I have been working on a self-contained OSX app for using the assistant, that doesn't depend on installing git, etc. More on that once we have something that works.

Posted Wed Oct 10 15:36:01 2012

I didn't plan to work on git-annex much while at DebConf, because the conference always prevents the kind of concentration I need. But I unexpectedly also had to deal with three dead drives and illness this week.

That said, I have been trying to debug a problem with git-annex and Haskell's threaded runtime all week. It just hangs, randomly. No luck so far isolating why, although I now have a branch that hangs fairly reliably, and in which I am trying to whittle the entire git-annex code base (all 18 thousand lines!) into a nice test case.

This threaded runtime problem doesn't affect the assistant yet, but if I want to use Yesod in developing the webapp, I'll need the threaded runtime, and using the threaded runtime in the assistant generally would make it more responsive and less hacky.

Since this is a task I can work on without much concentration, I'll probably keep beating on it until I return home. Then I need to spend some quality thinking time on where to go next in the assistant.

Posted Wed Oct 10 15:36:01 2012

Putting together a shortlist of things I want to sort out before the beta.

  • Progress bars for file uploads.
  • No mocked up parts in the webapp's UI. Think I implemented the last of those yesterday, although there are some unlinked repository configuration options.
  • The basic watching functionality, should work reliably. There are some known scalability issues with eg, kqueue on OSX that need to be dealt with, but needn't block a beta.
  • Should keep any configuration of repositories that can be set up using the webapp in sync whenever it's possible to do so. I think that'll work after the past few days work.
  • Should be easy to install and get running. Of course part of the point of the beta release is to get it out there, on Hackage, in Debian unstable, and in the other places that git-annex packagers put it. As to getting it running, the autostart files and menu items look good on Linux. The OSX equivilants still need work and testing.
  • No howlingly bad bugs. This bug is the one I'm most concerned with currently. OTOH, watcher commits unlocked files can be listed in the errata.

So I worked on progress bars for uploads today. Wrote a nice little parser for rsync's progress output, that parses arbitrary size chunks, returning any unparsable part. Added a ProgressCallback parameter to all the backends' upload methods. Wrote a nasty thing that intercepts rsync's output, currently a character at a time (horrible, but rsync doesn't output that much, so surprisingly acceptable), and outputs it and parses it. Hooked all this up, and got it working for uploads to git remotes. That's 1/10th of the total ways uploads can happen that have working progress bars. It'll take a while to fill in the rest..

Posted Wed Oct 10 15:36:01 2012

Starting to travel, so limited time today.

Yet Another Thread added to the assistant, all it does is watch for changes to transfer information files, and update the assistant's map of transfers currently in progress. Now the assistant will know if some other repository has connected to the local repo and is sending or receiving a file's content.

This seemed really simple to write, it's just 78 lines of code. It worked 100% correctly the first time. :) But it's only so easy because I've got this shiny new inotify hammer that I keep finding places to use in the assistant.

Also, the new thread does some things that caused a similar thread (the merger thread) to go into a MVar deadlock. Luckily, I spent much of day 19 investigating and fixing that deadlock, even though it was not a problem at the time.

So, good.. I'm doing things right and getting to a place where rather nontrivial features can be added easily.

--

Next up: Enough nonsense with tracking transfers... Time to start actually transferring content around!

Posted Wed Oct 10 15:36:01 2012

Syncing works well when the graph of repositories is strongly connected. Now I'm working on making it work reliably with less connected graphs.

I've been focusing on and testing a doubly-connected list of repositories, such as: A <-> B <-> C


I was seeing a lot of git-annex branch push failures occuring in this line of repositories topology. Sometimes was is able to recover from these, but when two repositories were trying to push to one-another at the same time, and both failed, both would pull and merge, which actually keeps the git-annex branch still diverged. (The two merge commits differ.)

A large part of the problem was that it pushed directly into the git-annex branch on the remote; the same branch the remote modifies. I changed it to push to synced/git-annex on the remote, which avoids most push failures. Only when A and C are both trying to push into B/synced/git-annex at the same time would one fail, and need to pull, merge, and retry.


With that change, git syncing always succeeded in my tests, and without needing any retries. But with more complex sets of repositories, or more traffic, it could still fail.

I want to avoid repeated retries, exponential backoffs, and that kind of thing. It'd probably be good enough, but I'm not happy with it because it could take arbitrarily long to get git in sync.

I've settled on letting it retry once to push to the synced/git-annex and synced/master branches. If the retry fails, it enters a fallback mode, which is guaranteed to succeed, as long as the remote is accessible.

The problem with the fallback mode is it uses really ugly branch names. Which is why Joachim Breitner and I originally decided on making git annex sync use the single synced/master branch, despite the potential for failed syncs. But in the assistant, the requirements are different, and I'm ok with the uglier names.

It does seem to make sense to only use the uglier names as a fallback, rather than by default. This preserves compatability with git annex sync, and it allows the assistant to delete fallback sync branches after it's merged them, so the ugliness is temporary.


Also worked some today on a bug that prevents C from receiving files added to A.

The problem is that file contents and git metadata sync independantly. So C will probably receive the git metadata from B before B has finished downloading the file from A. C would normally queue a download of the content when it sees the file appear, but at this point it has nowhere to get it from.

My first stab at this was a failure. I made each download of a file result in uploads of the file being queued to every remote that doesn't have it yet. So rather than C downloading from B, B uploads to C. Which works fine, but then C sees this download from B has finished, and proceeds to try to re-upload to B. Which rejects it, but notices that this download has finished, so re-uploads it to C...

The problem with that approach is that I don't have an event when a download succeeds, just an event when a download ends. Of course, C could skip uploading back to the same place it just downloaded from, but loops are still possible with other network topologies (ie, if D is connected to both B and C, there would be an upload loop 'B -> C -> D -> B`). So unless I can find a better event to hook into, this idea is doomed.

I do have another idea to fix the same problem. C could certianly remember that it saw a file and didn't know where to get the content from, and then when it receives a git push of a git-annex branch, try again.

Posted Wed Oct 10 15:36:01 2012

Since my last blog, I've been polishing the git annex watch command.

First, I fixed the double commits problem. There's still some extra committing going on in the git-annex branch that I don't understand. It seems like a shutdown event is somehow being triggered whenever a git command is run by the commit thread.

I also made git annex watch run as a proper daemon, with locking to prevent multiple copies running, and a pid file, and everything. I made git annex watch --stop stop it.


Then I managed to greatly increase its startup speed. At startup, it generates "add" events for every symlink in the tree. This is necessary because it doesn't really know if a symlink is already added, or was manually added before it starter, or indeed was added while it started up. Problem was that these events were causing a lot of work staging the symlinks -- most of which were already correctly staged.

You'd think it could just check if the same symlink was in the index. But it can't, because the index is in a constant state of flux. The symlinks might have just been deleted and re-added, or changed, and the index still have the old value.

Instead, I got creative. :) We can't trust what the index says about the symlink, but if the index happens to contain a symlink that looks right, we can trust that the SHA1 of its blob is the right SHA1, and reuse it when re-staging the symlink. Wham! Massive speedup!


Then I started running git annex watch on my own real git annex repos, and noticed some problems.. Like it turns normal files already checked into git into symlinks. And it leaks memory scanning a big tree. Oops..


I put together a quick screencast demoing git annex watch.

While making the screencast, I noticed that git-annex watch was spinning in strace, which is bad news for powertop and battery usage. This seems to be a GHC bug also affecting Xmonad. I tried switching to GHC's threaded runtime, which solves that problem, but causes git-annex to hang under heavy load. Tried to debug that for quite a while, but didn't get far. Will need to investigate this further.. Am seeing indications that this problem only affects ghc 7.4.1; in particular 7.4.2 does not seem to have the problem.

Posted Wed Oct 10 15:36:01 2012

Really productive day today, now that I'm out of the threaded runtime tarpit!

First, brought back --debug logging, better than before! As part of that, I wrote some 250 lines of code to provide a IMHO more pleasant interface to System.Process (itself only 650 lines of code) that avoids all the low-level setup, cleanup, and tuple unpacking. Now I can do things like write to a pipe to a process, and ensure it exits nonzero, this easily:

withHandle StdinHandle createProcessSuccess (proc "git" ["hash-object", "--stdin"]) $ \h ->
    hHutStr h objectdata

My interface also makes it easy to run nasty background processes, reading their output lazily.

lazystring <- withHandle StdoutHandle createBackgroundProcess (proc "find" ["/"]) hGetContents

Any true Haskellers are shuddering here, I really should be using conduits or pipes, or something. One day..


The assistant needs to detect when removable drives are attached, and sync with them. This is a reasonable thing to be working on at this point, because it'll make the currently incomplete data transfer code fully usable for the sneakernet use case, and firming that up will probably be a good step toward handing other use cases involving data transfer over the network, including cases where network remotes are transientely available.

So I've been playing with using dbus to detect mount events. There's a very nice Haskell library to use dbus.

This simple program will detect removable drives being mounted, and works on Xfce (as long as you have automounting enabled in its configuration), and should also work on Gnome, and, probably, KDE:

[[!format haskell """ {-# LANGUAGE OverloadedStrings #-}

import Data.List (sort) import DBus import DBus.Client import Control.Monad

main = do client <- connectSession

listen client mountadded $ \s ->
    putStrLn (show s)

forever $ getLine -- let listener thread run forever

where
    mountadded = matchAny
        { matchInterface = Just "org.gtk.Private.RemoteVolumeMonitor"
        , matchMember = Just "MountAdded"
        }

"""]]

(Yeah... "org.gtk.Private.RemoteVolumeMonitor". There are so many things wrong with that string. What does gtk have to do with mounting a drive? Why is it Private? Bleagh. Should I only match the "MountAdded" member and not the interface? Seems everyone who does this relies on google to find other people who have cargo-culted it, or just runs dbus-monitor and picks out things. There seems to be no canonical list of events. Bleagh.)


Spent a while shaving a yak of needing a getmntent interface in Haskell. Found one in a hsshellscript library; since that library is not packaged in Debian, and I don't really want to depend on it, I extracted just the mtab and fstab parts of it into a little library in git-annex.


I've started putting together a MountWatcher thread. On systems without dbus (do OSX or the BSDs have dbus?), or if dbus is not running, it polls /etc/mtab every 10 seconds for new mounts. When dbus is available, it doesn't need the polling, and should notice mounts more quickly.

Open question: Should it still poll even when dbus is available? Some of us like to mount our own drives, by hand and may have automounting disabled. It'd be good if the assistant supported that. This might need a annex.no-dbus setting, but I'd rather avoid needing such manual configuration.

One idea is to do polling in addition to dbus, if /etc/fstab contains mount points that seem to be removable drives, on which git remotes lives. Or it could always do polling in addition to dbus, which is just some extra work. Or, it could try to introspect dbus to see if mount events will be generated.

The MountWatcher so far only detects new mounts and prints out what happened. Next up: Do something in response to them.

This will involve manipulating the Annex state to belatedly add the Remote on the mount point.. tricky. And then, for Git Remotes, it should pull/push the Remote to sync git data. Finally, for all remotes, it will need to queue Transfers of file contents from/to the newly available Remote.

Posted Wed Oct 10 15:36:01 2012

Short day today, but I again worked only on progress bars.

  • Added upload progress tracking for the directory special remote.
  • Some optimisations.
  • Added a git annex-shell transferkey command. This isn't used yet, but the plan is to use it to feed back information about how much of a file has been sent when downloading it. So that the uploader can display a progress bar. This method avoids needing to parse the rsync protocol, which is approximately impossible without copying half of rsync. Happily, git-annex's automatic ssh connection caching will make the small amount of data this needs to send be efficiently pipelined over the same ssh connection that rsync is using.

I probably have less than 10 lines of code to write to finish up progressbars for now. Looking forward to getting that behind me, and on to something more interesting. Even doing mail merge to print labels to mail out Kickstarter rewards is more interesting than progress bars at this point. :)

Posted Wed Oct 10 15:36:01 2012

Only had a few hours to work today, but my current focus is speed, and I have indeed sped up parts of git annex watch.

One thing folks don't realize about git is that despite a rep for being fast, it can be rather slow in one area: Writing the index. You don't notice it until you have a lot of files, and the index gets big. So I've put a lot of effort into git-annex in the past to avoid writing the index repeatedly, and queue up big index changes that can happen all at once. The new git annex watch was not able to use that queue. Today I reworked the queue machinery to support the types of direct index writes it needs, and now repeated index writes are eliminated.

... Eliminated too far, it turns out, since it doesn't yet ever flush that queue until shutdown! So the next step here will be to have a worker thread that wakes up periodically, flushes the queue, and autocommits. (This will, in fact, be the start of the syncing phase of my roadmap!) There's lots of room here for smart behavior. Like, if a lot of changes are being made close together, wait for them to die down before committing. Or, if it's been idle and a single file appears, commit it immediately, since this is probably something the user wants synced out right away. I'll start with something stupid and then add the smarts.

(BTW, in all my years of programming, I have avoided threads like the nasty bug-prone plague they are. Here I already have three threads, and am going to add probably 4 or 5 more before I'm done with the git annex assistant. So far, it's working well -- I give credit to Haskell for making it easy to manage state in ways that make it possible to reason about how the threads will interact.)

What about the races I've been stressing over? Well, I have an ulterior motive in speeding up git annex watch, and that's to also be able to slow it down. Running in slow-mo makes it easy to try things that might cause a race and watch how it reacts. I'll be using this technique when I circle back around to dealing with the races.

Another tricky speed problem came up today that I also need to fix. On startup, git annex watch scans the whole tree to find files that have been added or moved etc while it was not running, and take care of them. Currently, this scan involves re-staging every symlink in the tree. That's slow! I need to find a way to avoid re-staging symlinks; I may use git cat-file to check if the currently staged symlink is correct, or I may come up with some better and faster solution. Sleeping on this problem.


Oh yeah, I also found one more race bug today. It only happens at startup and could only make it miss staging file deletions.

Posted Wed Oct 10 15:36:01 2012

I hear that people want the git-annex assistant to be easy to install without messing about building it from source..

on OSX

So Jimmy and I have been working all week on making an easily installed OSX app of the assistant. This is a .dmz file that bundles all the dependencies (git, etc) in, so it can be installed with one click.

It seems to basically work. You can get it here.

Unfortunatly, the pasting into annex on OSX bug resurfaced while testing this.. So I can't really recommend using it on real data yet.

Still, any testing you can do is gonna be really helpful. I'm squashing OSX bugs right and left.

on Linux

First of all, the git-annex assistant is now available in Debian unstable, and in Arch Linux's AUR. Proper packages.

For all the other Linux distributions, I have a workaround. It's a big hack, but it seems to work.. at least on Debian stable.

I've just put up a linux standalone tarball, which has no library dependencies apart from glibc, and doesn't even need git to be installed on your system.

on FreeBSD

The FreeBSD port has been updated to include the git-annex assistant too..

Posted Wed Oct 10 15:36:01 2012

Today I worked on the race conditions, and fixed two of them. Both were fixed by avoiding using git add, which looks at the files currently on disk. Instead, git annex watch injects symlinks directly into git's index, using git update-index.

There is one bad race condition remaining. If multiple processes have a file open for write, one can close it, and it will be added to the annex. But then the other can still write to it.


Getting away from race conditions for a while, I made git annex watch not annex .gitignore and .gitattributes files.

And, I made it handle running out of inotify descriptors. By default, /proc/sys/fs/inotify/max_user_watches is 8192, and that's how many directories inotify can watch. Now when it needs more, it will print a nice message showing how to increase it with sysctl.

FWIW, DropBox also uses inotify and has the same limit. It seems to not tell the user how to fix it when it goes over. Here's what git annex watch will say:

Too many directories to watch! (Not watching ./dir4299)
Increase the limit by running:
  echo fs.inotify.max_user_watches=81920 | sudo tee -a /etc/sysctl.conf; sudo sysctl -p
Posted Wed Oct 10 15:36:01 2012

Kickstarter is over. Yay!

Today I worked on the bug where git annex watch turned regular files that were already checked into git into symlinks. So I made it check if a file is already in git before trying to add it to the annex.

The tricky part was doing this check quickly. Unless I want to write my own git index parser (or use one from Hackage), this check requires running git ls-files, once per file to be added. That won't fly if a huge tree of files is being moved or unpacked into the watched directory.

Instead, I made it only do the check during git annex watch's initial scan of the tree. This should be OK, because once it's running, you won't be adding new files to git anyway, since it'll automatically annex new files. This is good enough for now, but there are at least two problems with it:

  • Someone might git merge in a branch that has some regular files, and it would add the merged in files to the annex.
  • Once git annex watch is running, if you modify a file that was checked into git as a regular file, the new version will be added to the annex.

I'll probably come back to this issue, and may well find myself directly querying git's index.


I've started work to fix the memory leak I see when running git annex watch in a large repository (40 thousand files). As always with a Haskell memory leak, I crack open Real World Haskell's chapter on profiling.

Eventually this yields a nice graph of the problem:

memory profile

So, looks like a few minor memory leaks, and one huge leak. Stared at this for a while and trying a few things, and got a much better result:

memory profile

I may come back later and try to improve this further, but it's not bad memory usage. But, it's still rather slow to start up in such a large repository, and its initial scan is still doing too much work. I need to optimize more..

Posted Wed Oct 10 15:36:01 2012

After an all-nighter, I have git annex webapp launching a WebApp!

It doesn't do anything useful yet, just uses Yesod to display a couple of hyperlinked pages and a favicon, securely.

The binary size grew rather alarmingly, BTW. :) Indeed, it's been growing for months..

-rwxr-xr-x 1 root root 9.4M Jul 21 16:59 git-annex-no-assistant-stripped
-rwxr-xr-x 1 joey joey  12M Jul 25 20:54 git-annex-no-webapp-stripped
-rwxr-xr-x 1 joey joey  17M Jul 25 20:52 git-annex-with-webapp-stripped

Along the way, some Not Invented Here occurred:

I didn't use the yesod scaffolded site, because it's a lot of what seems mostly to be cruft in this use case. And because I don't like code generated from templates that people are then expected to edit. Ugh. That's my least favorite part of Yesod. This added some pain, since I had to do everything the hard way.

I didn't use wai-handler-launch because:

  • It seems broken on IPv6 capable machines (it always opens http://127.0.0.1:port/ even though it apparently doesn't always listen there.. I think it was listening on my machine's ipv6 address instead. I know, I know; I should file a bug about this..)
  • It always uses port 4587, which is insane. What if you have two webapps?
  • It requires javascript in the web browser, which is used to ping the server, and shut it down when the web browser closes (which behavior is wrong for git-annex anyway, since the daemon should stay running across browser closes).
  • It opens the webapp on web server startup, which is wrong for git-annex; instead the command git annex webapp will open the webapp, after git annex assistant started the web server.

Instead, I rolled my own WAI webapp laucher, that binds to any free port on localhost, It does use xdg-open to launch the web browser, like wai-handler-launch (or just open on OS X).

Also, I wrote my own WAI logger, which logs using System.Log.Logger, instead of to stdout, like runDebug does.


The webapp only listens for connections from localhost, but that's not sufficient "security". Instead, I added a secret token to every url in the webapp, that only git annex webapp knows about.

But, if that token is passed to xdg-open on its command line, it will be briefly visible to local attackers in the parameters of xdg-open.. And if the web browser's not already running, it'll run with it as a parameter, and be very visible.

So instead, I used a nasty hack. On startup, the assistant will create a html file, readably only by the user, that redirects the user to the real site url. Then git annex webapp will run xdg-open on that file.


Making Yesod check the auth= parameter (to verify that the secret token is right) is when using Yesod started to pay off. Yesod has a simple isAuthorized method that can be overridden to do your own authentication like this.

But Yesod really started to shine when I went to add the auth= parameter to every url in the webapp. There's a joinPath method can can be used to override the default url builder. And every type-safe url in the application goes through there, so it's perfect for this.

I just had to be careful to make it not add auth= to the url for the favicon, which is included in the "Permission Denied" error page. That'd be an amusing security hole..


Next up: Doing some AJAX to get a dynamic view of the state of the daemon, including currently running transfers, in the webapp. AKA stuff I've never done before, and that, unlike all this heavy Haskell Yesod, scares me. :)

Posted Wed Oct 10 15:36:01 2012

I've changed the default backend used by git-annex from SHA256 to SHA256E. Including the filename extension in the key is known to make repositories more usable on things like MP3 players, and I've recently learned it also avoids Weird behavior with OS X Finder and Preview.app.

I thought about only changing the default in repositories set up by the assistant, but it seemed simpler to change the main default. The old backend is really only better if you might have multiple copies of files with the same content that have different extensions.

Fixed the socket leak in pairing that eluded me earlier.

I've made a new polls page, and posted a poll: prioritizing special remotes

Posted Wed Oct 10 15:36:01 2012

Worked more on upload progress tracking. I'm fairly happy with its state now:

  • It's fully implemented for rsync special remotes.

  • Git remotes also fully support it, with the notable exception of file uploads run by git-annex-shell recvkey. That runs rsync --server --sender, and in that mode, rsync refuses to output progress info. Not sure what to do about this case. Maybe I should write a parser for the rsync wire protocol that can tell what chunk of the file is being sent, and shim it in front of the rsync server? That's rather hardcore, but it seems the best of a bad grab bag of options that include things like LD_PRELOAD hacks.

  • Also optimised the rsync progress bar reader to read whole chunks of data rather than one byte at a time.

  • Also got progress bars to actually update in the webapp for uploads.

    This turned out to be tricky because kqueue cannot be used to detect when existing files have been modified. (One of kqueue's worst shortcomings vs inotify.) Currently on kqueue systems it has to poll.

I will probably upload add progress tracking to the directory special remote, which should be very easy (it already implements its own progress bars), and leave the other special remotes for later. I can add upload progress tracking to each special remote when I add support for configuring it in the webapp.

Posted Wed Oct 10 15:36:01 2012

Not much available time today, only a few hours.

Main thing I did was fixed up the failed push tracking to use a better data structure. No need for a queue of failed pushes, all it needs is a map of remotes that have an outstanding failed push, and a timestamp. Now it won't grow in memory use forever anymore. :)

Finding the right thread mutex type for this turned out to be a bit of a challenge. I ended up with a STM TMVar, which is left empty when there are no pushes to retry, so the thread using it blocks until there are some. And, it can be updated transactionally, without races.

I also fixed a bug outside the git-annex assistant code. It was possible to crash git-annex if a local git repository was configured as a remote, and the repository was not available on startup. git-annex now ignores such remotes. This does impact the assistant, since it is a long running process and git repositories will come and go. Now it ignores any that were not available when it started up. This will need to be dealt with when making it support removable drives.

Posted Wed Oct 10 15:36:01 2012

Spent yesterday and today making the WebApp handle adding removable drives.

While it needs more testing, I think that it's now possible to use the WebApp for a complete sneakernet usage scenario.

  • Start up the webapp, let it make a local repo.
  • Add some files, by clicking to open the file manager, and dragging them in.
  • Plug in a drive, and tell the webapp to add it.
  • Wait while files sync..
  • Take the drive to another computer, and repeat the process there.

No command-line needed, and files will automatically be synced between two or more computers using the drive.

Sneakernet is only one usage scenario for the git-annex assistant, but I'm really happy to have one scenario 100% working!

Indeed, since the assistant and webapp can now actually do something useful, I'll probably be merging them into master soon.

Details follow..


So, yesterday's part of this was building the configuration page to add a removable drive. That needs to be as simple as possible, and it currently consists of a list of things git-annex thinks might be mount points of removable drives, along with how much free space they have. Pick a drive, click the pretty button, and away it goes..

(I decided to make the page so simple it doesn't even ask where you want to put the directory on the removable drive. It always puts it in a "annex" directory. I might add an expert screen later, but experts can always set this up themselves at the command line too.)

I also fought with Yesod and Bootstrap rather a lot to make the form look good. Didn't entirely succeed, and had to file a bug on Yesod about its handling of check boxes. (Bootstrap also has a bug, IMHO; its drop down lists are not always sized wide enough for their contents.)

Ideally this configuration page would listen for mount events, and refresh its list. I may add that eventually; I didn't have a handy channel it could use to do that, so defferred it. Another idea is to have the mount event listener detect removable drives that don't have an annex on them yet, and pop up an alert with a link to this configuration page.


Making the form led to a somewhat interesting problem: How to tell if a mounted filesystem is a removable drive, or some random thing like /proc or a fuse filesystem. My answer, besides checking that the user can write to it, was various heuristics, which seem to work ok, at least here..

[[!format haskell """ sane Mntent { mnt_dir = dir, mnt_fsname = dev } {- We want real disks like /dev/foo, not - dummy mount points like proc or tmpfs or - gvfs-fuse-daemon. -} | not ('/' elem dev) = False {- Just in case: These mount points are surely not - removable disks. -} | dir == "/" = False | dir == "/tmp" = False | dir == "/run/shm" = False | dir == "/run/lock" = False """]]


Today I did all the gritty coding to make it create a git repository on the removable drive, and tell the Annex monad about it, and ensure it gets synced.

As part of that, it detects when the removable drive's filesystem doesn't support symlinks, and makes a bare repository in that case. Another expert level config option that's left out for now is to always make a bare repository, or even to make a directory special remote rather than a git repository at all. (But directory special remotes cannot support the sneakernet use case by themselves...)


Another somewhat interesting problem was what to call the git remotes that it sets up on the removable drive and the local repository. Again this could have an expert-level configuration, but the defaults I chose are to use the hostname as the remote name on the removable drive, and to use the basename of the mount point of the removable drive as the remote name in the local annex.


Originally, I had thought of this as cloning the repository to the drive. But, partly due to luck, I started out just doing a git init to make the repository (I had a function lying around to do that..).

And as I worked on it some more, I realized this is not as simple as a clone. It's a bi-directional sync/merge, and indeed the removable drive may have all the data already in it, and the local repository have just been created. Handling all the edge cases of that (like, the local repository may not have a "master" branch yet..) was fun!

Posted Wed Oct 10 15:36:01 2012

Turns out I was able to easily avoid the potential upload loops that would occur if each time a repo receives a download, it queues uploads to the repos it's connected to. With that done. I suspect, but have not proven, that the assistant is able to keep repos arranged in any shape of graph in sync, as long as it's connected (of course) and each connection is bi-directional. That's a good start .. or at least a nice improvement from only strongly connected graphs being kept in sync.

Eliminated some empty commits that would be made sometimes, which is a nice optimisation.


I wanted to get back to some UI work after this week's deep dive into the internals. So I filled in a missing piece, the repository switcher in the upper right corner. Now the webapp's UI allows setting up different repositories for different purposes, and switching between them.

Posted Wed Oct 10 15:36:01 2012

The webapp now displays actual progress bars, for the actual transfers that the assistant is making! And it's seriously shiny.

Yes, I used Bootstrap. I can see why so many people are using it, that the common complaint is everything looks the same. I spent a few hours mocking up the transfer display part of the WebApp using Bootstrap, and arrived at something that doesn't entirely suck remarkably quickly.

The really sweet thing about Bootstrap is that when I resized my browser to the shape of a cell phone, it magically redrew the WebApp like so:


To update the display, the WebApp uses two techniques. On noscript browsers, it just uses a meta refresh, which is about the best I can do. I welcome feedback; it might be better to just have an "Update" button in this case.

With javascript enabled, it uses long polling, done over AJAX. There are some other options I considered, including websockets, and server-sent events. Websockets seem too new, and while there's a WAI module supporting server-sent events, and even an example of them in the Yesod book, the module is not packaged for Debian yet. Anyway, long polling is the most widely supported, so a good starting place. It seems to work fine too, I don't really anticipate needing the more sophisticated methods.

(Incidentially, this's the first time I've ever written code that uses AJAX.)

Currently the status display is rendered in html by the web server, and just updated into place by javascript. I like this approach since it keeps the javascript code to a minimum and the pure haskell code to a maximum. But who knows, I may have to switch to JSON that gets rendered by javascript, for some reason, later on.


I was very happy with Yesod when I managed to factor out a general purpose widget that adds long-polling and meta-refresh to any other widget. I was less happy with Yesod when I tried to include jquery on my static site and it kept serving up a truncated version of it. Eventually worked around what's seemingly a bug in the default WAI middleware, by disabling that middleware.


Also yesterday I realized there were about 30 comments stuck in moderation on this website. I thought I had a feed of those, but obviously I didn't. I've posted them all, and also read them all.


Next up is probably some cleanup of bugs and minor todos. Including figuring out why watch has started to segfault on OSX when it was working fine before.

After that, I need to build a way to block the long polling request until the DaemonStatus and/or TransferQueue change from the version previously displayed by the WebApp. An interesting concurrency problem..

Once I have that working, I can reduce the current 3 second delay between refreshes to a very short delay, and the WebApp will update in near-realtime as changes come in.

Posted Wed Oct 10 15:36:01 2012

Focus today was writing a notification broadcaster library. This is a way to send a notification to a set of clients, any of which can be blocked waiting for a new notification to arrive. A complication is that any number of clients may be be dead, and we don't want stale notifications for those clients to pile up and leak memory.

It took me 3 tries to find the solution, which turns out to be head-smackingly simple: An array of SampleVars, one per client.

Using SampleVars means that clients only see the most recent notification, but when the notification is just "the assistant's state changed somehow; display a refreshed rendering of it", that's sufficient.


First use of that was to make the thread that woke up every 10 minutes and checkpointed the daemon status to disk also wait for a notification that it changed. So that'll be more current, and use less IO.


Second use, of course, was to make the WebApp block long polling clients until there is really a change since the last time the client polled.

To do that, I made one change to my Yesod routes:

[[!format diff """ -/status StatusR GET +/status/#NotificationId StatusR GET """]]

Now I find another reason to love Yesod, because after doing that, I hit "make".. and fixed the type error. And hit make.. and fixed the type error. And then it just freaking worked! Html was generated with all urls to /status including a NotificationId, and the handler for that route got it and was able to use it:

[[!format haskell """ {- Block until there is an updated status to display. -} b <- liftIO $ getNotificationBroadcaster webapp liftIO $ waitNotification $ notificationHandleFromId b nid """]]

And now the WebApp is able to display transfers in realtime! When I have both the WebApp and git annex get running on the same screen, the WebApp displays files that git-annex is transferring about as fast as the terminal updates.

The progressbars still need to be sorted out, but otherwise the WebApp is a nice live view of file transfers.


I also had some fun with Software Transactional Memory. Now when the assistant moves a transfer from its queue of transfers to do, to its map of transfers that are currently running, it does so in an atomic transaction. This will avoid the transfer seeming to go missing (or be listed twice) if the webapp refreshes at just the wrong point in time. I'm really starting to get into STM.


Next up, I will be making the WebApp maintain a list of notices, displayed on its sidebar, scrolling new notices into view, and removing ones the user closes, and ones that expire. This will be used for displaying errors, as well as other communication with the user (such as displaying a notice while a git sync is in progress with a remote, etc). Seems worth doing now, so the basic UI of the WebApp is complete with no placeholders.

Posted Wed Oct 10 15:36:01 2012

Implemented everything I planned out yesterday: Expensive scans are only done once per remote (unless the remote changed while it was disconnected), and failed transfers are logged so they can be retried later.

Changed the TransferScanner to prefer to scan low cost remotes first, as a crude form of scheduling lower-cost transfers first.

A whole bunch of interesting syncing scenarios should work now. I have not tested them all in detail, but to the best of my knowledge, all these should work:

  • Connect to the network. It starts syncing with a networked remote. Disconnect the network. Reconnect, and it resumes where it left off.
  • Migrate between networks (ie, home to cafe to work). Any transfers that can only happen on one LAN are retried on each new network you visit, until they succeed.

One that is not working, but is soooo close:

  • Plug in a removable drive. Some transfers start. Yank the plug. Plug it back in. All necessary transfers resume, and it ends up fully in sync, no matter how many times you yank that cable.

That's not working because of an infelicity in the MountWatcher. It doesn't notice when the drive gets unmounted, so it ignores the new mount event.

Posted Wed Oct 10 15:36:01 2012

Started today doing testing of syncing, and found some bugs and things it needs to do better. But was quickly sidetracked when I noticed that transferkey was making a commit to the git-annex branch for every file it transferred, which is too slow and bloats history too much.

To fix that actually involved fixing a long-standing annoyance; that read-only git-annex commands like whereis sometimes start off with "(Recording state in git)", when the journal contains some not yet committed changes to the git-annex branch. I had to carefully think through the cases to avoid those commits.

As I was working on that, I found a real nasty lurking bug in the git-annex branch handling. It's unlikely to happen unless annex.autocommit=false is set, but it could occur when two git-annex processes race one another just right too. The root of the bug is that git cat-file --batch does not always show changes made to the index after it started. I think it does in enough cases to have tricked me before, but in general it can't be trusted to report the current state of the index, but only some past state.

I was able to fix the bug, by ensuring that changes being made to the branch are always visible in either the journal or the branch -- never in the index alone.


Hopefully something less low-level tomorrow..!

Posted Wed Oct 10 15:36:01 2012

Got ssh probing implemented. It checks if it can connect to the server, and probes the server to see how it should be used.

Turned out to need two ssh probes. The first uses the system's existing ssh configuration, but disables password prompts. If that's able to get in without prompting for a password, then the user must have set that up, and doesn't want to be bothered with password prompts, and it'll respect that configuration.

Otherwise, it sets up a per-host ssh key, and configures a hostname alias in ~/.ssh/config to use that key, and probes using that. Configuring ssh this way is nice because it avoids changing ssh's behavior except when git-annex uses it, and it does not open up the server to arbitrary commands being run without password.

--

Next up will be creating the repositories. When there's a per-host key, this will also involve setting up authorized_keys, locking down the ssh key to only allow running git-annex-shell or rsync.

I decided to keep that separate from the ssh probing, even though it means the user will be prompted twice for their ssh password. It's cleaner and allows the probing to do other checks -- maybe it'll later check the amount of free disk space -- and the user should be able to decide after the probe whether or not to proceed with making the repository.

Posted Wed Oct 10 15:36:01 2012

About half way done with implementing pairing. The webapp's interface to prompt for a secret and start pairing is done; the protocol is implemented; broadcasting of pairing requests is working; added Yet Another Thread to listen for incoming pairing traffic.

Very happy with how this came together; starting with defining the protocol with data types let me rapidly iterate until I had designed a simple, clean, robust protocol. The implementation works well too; it's even possible to start pairing, and only then bring up the network interface to the machine you intended to pair with, and it'll detect the new interface and start sending requests to it.

Next, I need to make alerts have a button that performs a stored IO action. So that the incoming pair request alert can have a button to respond to the pair request. And then I need to write the code to actually perform the pairing, including ssh key setup.

Posted Wed Oct 10 15:36:01 2012

Spent a lot of time this weekend thinking about/stuck on the cloud notification problem. Currently IRC is looking like the best way for repositories to notify one-another when changes are made, but I'm not sure about using that, and not ready to start on it.

Instead, laid some groundwork for transfer control today. Added some simple commands to manage groups of repositories, and find files that are present in repositories in a group. I'm not completely happy with the syntax for that, and need to think up some good syntax to specify files that are present in all repositories in a group.

The plan is to have the assistant automatically guess at groups to put new repositories it makes in (it should be able to make good guesses), as well as have an interface to change them, and an interface to configure transfer control using these groups (and other ways of matching files). And, probably, some canned transfer control recipes for common setups.


Collected up the past week's work and made a release today. I'm probably back to making regular releases every week or two.

Posted Wed Oct 10 15:36:01 2012

Actually did do some work on the webapp today, just fixing a bug I noticed in a spare moment. Also managed a bit in the plane earlier this week, implementing resuming of paused transfers. (Still need to test that.)

But the big thing today was dinner with one of my major Kickstarter backers, and as it turned out, "half the Haskell community of San Francisco" (3 people). Enjoyed talking about git-annex and haskell with them.

I'm looking forward to getting back home and back to work on Monday..

Posted Wed Oct 10 15:36:01 2012

It's done! The assistant branch is merged into master.

Updated the assistant page with some screenshots and instructions for using it.

Made some cosmetic fixes to the webapp.

Fixed the transferrer to use ~/.config/git-annex/program to find the path to git-annex when running it. (There are ways to find the path of the currently running program in unux, but they all suck, so I'm avoiding them this way.)

Read some OSX launchd documentation, and it seems it'd be pretty easy to get the assistant to autostart on login on OSX. If someone would like to test launchd files for me, get in touch.


AKA: Procrastinating really hard on those progress bars. ;)

Posted Wed Oct 10 15:36:01 2012

Today I added a "Files" link in the navbar of the WebApp. It looks like a regular hyperlink, but clicking on it opens up your desktop's native file manager, to manage the files in the repository!

Quite fun to be able to do this kind of thing from a web page. :)


Made git annex init (and the WebApp) automatically generate a description of the repo when none is provided.


Also worked on the configuration pages some. I don't want to get ahead of myself by diving into the full configuration stage yet, but I am at least going to add a configuration screen to clone the repo to a removable drive.

After that, the list of transfers on the dashboard needs some love. I'll probably start by adding UI to cancel running transfers, and then try to get drag and drop reordering of transfers working.

Posted Wed Oct 10 15:36:01 2012

Now installing git-annex automatically generates a freedesktop.org .desktop file, and installs it, either system-wide (root) or locally (user). So Menu -> Internet -> Git Annex will start up the web app.

(I don't entirely like putting it on the Internet menu, but the Accessories menu is not any better (and much more crowded here), and there's really no menu where it entirely fits.)

I generated that file by writing a generic library to deal with freedesktop.org desktop files and locations. Which seemed like overkill at the time, but then I found myself continuing to use that library. Funny how that happens.

So, there's also another .desktop file that's used to autostart the git-annex assistant daemon when the user logs into the desktop.

This even works when git-annex is installed to the ugly non-PATH location .cabal/bin/git-annex by Cabal! To make that work, it records the path the binary is at to a freedesktop.org data file, at install time.


That should all work in Gnome, KDE, XFCE, etc. Not Mac OSX I'm guessing...


Also today, I added a sidebar notification when the assistant notices new files. To make that work well, I implemented merging of related sidebar action notifications, so the effect is that there's one notification that collectes a list of recently added files, and transient notifications that show up if a really big file is taking a while to checksum.

I'm pleased that the notification interface is at a point where I was able to implement all that, entirely in pure functional code.

Posted Wed Oct 10 15:36:01 2012

Tons of pairing work, which culminated today in pairing fully working for the very first time. And it works great! Type something like "my hovercraft is full of eels" into two git annex webapps on the same LAN and the two will find each other, automatically set up ssh keys, and sync up, like magic. Magic based on math.

  • Revert changes made to authorized_keys when the user cancels a pairing response. Which could happen if the machine that sent the pairing request originally is no longer on the network.
  • Some fixes to handle lossy UDP better. Particularly tricky at the end of the conversation -- how do both sides reliably know when a conversation is over when it's over a lossy wire? My solution is just to remember some conversatons we think are over, and keep saying "this conversation is over" if we see messages in that conversation. Works.
  • Added a UUID that must be the same in related pairing messages. This has a nice security feature: It allows detection of brute-force attacks to guess the shared secret, after the first wrong guess! In which case the pairing is canceled and a warning printed.
  • That led to a thorough security overview, which I've added to the pairing page. Added some guards against unusual attacks, like console poisioning attacks. I feel happy with the security of pairing now, with the caveats that only I have reviewed it (and reviewing your own security designs is never ideal), and that the out-of-band shared secret communication between users is only as good as they make it.
  • Found a bug in Yesod's type safe urls. At least, I think it's a bug. Worked around it.
  • Got very stuck trying to close the sockets that are opened to send multicast pairing messages. Nothing works, down to and including calling C close(). At the moment I have a socket leak. :( I need to understand the details of multicast sockets better to fix this. Emailed the author of the library I'm using for help.
Posted Wed Oct 10 15:36:01 2012

Lots of WebApp UI improvements, mostly around the behavior when displaying alert messages. Trying to make the alerts informative without being intrusively annoying, think I've mostly succeeded now.

Also, added an intro display. Shown is the display with only one repo; if there are more repos it also lists them all.

Posted Wed Oct 10 15:36:01 2012

Today, added a thread that deals with recovering when there's been a loss of network connectivity. When the network's down, the normal immediate syncing of changes of course doesn't work. So this thread detects when the network comes back up, and does a pull+push to network remotes, and triggers scanning for file content that needs to be transferred.

I used dbus again, to detect events generated by both network-manager and wicd when they've sucessfully brought an interface up. Or, if they're not available, it polls every 30 minutes.

When the network comes up, in addition to the git pull+push, it also currently does a full scan of the repo to find files whose contents need to be transferred to get fully back into sync.

I think it'll be ok for some git pulls and pushes to happen when moving to a new network, or resuming a laptop (or every 30 minutes when resorting to polling). But the transfer scan is currently really too heavy to be appropriate to do every time in those situations. I have an idea for avoiding that scan when the remote's git-annex branch has not changed. But I need to refine it, to handle cases like this:

  1. a new remote is added
  2. file contents start being transferred to (or from it)
  3. the network is taken down
  4. all the queued transfers fail
  5. the network comes back up
  6. the transfer scan needs to know the remote was not all in sync before #3, and so should do a full scan despite the git-annex branch not having changed

Doubled the ram in my netbook, which I use for all development. Yesod needs rather a lot of ram to compile and link, and this should make me quite a lot more productive. I was struggling with OOM killing bits of chromium during my last week of development.

Posted Wed Oct 10 15:36:01 2012

Based on the results of yesterday's poll, the WebApp defaults to ~/Desktop/annex when run in the home directory. If there's no Desktop directory, it uses just ~/annex. And if run from some other place than the home directory, it assumes you want to use cwd. Of course, you can change this default, but I think it's a good one for most use cases.


My work today has all been on making one second of the total lifetime of the WebApp work. It's the very tricky second in between clicking on "Make repository" and being redirected to a WebApp running in your new repository. The trickiness involves threads, and MVars, and multiple web servers, and I don't want to go into details here. I'd rather forget. ;-)

Anyway, it works; you can run "git annex webapp" and be walked right through to having a usable repository! Now I need to see about adding that to the desktop menus, and making "git annex webapp", when run a second time, remembering where your repository is. I'll use ~/.config/git-annex/repository for storing that.

Posted Wed Oct 10 15:36:01 2012

Syncing works! I have two clones, and any file I create in the first is immediately visible in the second. Delete that file from the second, and it's immediately removed from the first.

Most of my work today felt like stitching existing limbs onto a pre-existing monster. Took the committer thread, that waits for changes and commits them, and refashioned it into a pusher thread, that waits for commits and pushes them. Took the watcher thread, that watches for files being made, and refashioned it into a merger thread, that watches for git refs being updated. Pulled in bits of the git annex sync command to reanimate this.

It may be a shambling hulk, but it works.

Actually, it's not much of a shambling hulk; I refactored my code after copying it. ;)

I think I'm up to 11 threads now in the new git annex assistant command, each with its own job, and each needing to avoid stepping on the other's toes. I did see one MVar deadlock error today, which I have not managed to reproduce after some changes. I think the committer thread was triggering the merger thread, which probably then waited on the Annex state MVar the committer thread had held.

Anyway, it even pushes to remotes in parallel, and keeps track of remotes it failed to push to, although as of yet it doesn't do any attempt at periodically retrying.

One bug I need to deal with is that the push code assumes any change made to the remote has already been pushed back to it. When it hasn't, the push will fail due to not being a fast-forward. I need to make it detect this case and pull before pushing.

(I've pushed this work out in a new assistant branch.)

Posted Wed Oct 10 15:36:01 2012

Some days I spend 2 hours chasing red herrings (like "perhaps my JSON ajax calls arn't running asynchronoously?") that turn out to be a simple one-word typo. This was one of them.

However, I did get the sidebar displaying alert messages, which can be easily sent to the user from any part of the assistant. This includes transient alerts of things it's doing, which disappear once the action finishes, and long-term alerts that are displayed until the user closes them. It even supports rendering arbitrary Yesod widgets as alerts, so they can also be used for asking questions, etc.

Time for a screencast!

Posted Wed Oct 10 15:36:01 2012

... I'm getting tired of kqueue.

But the end of the tunnel is in sight. Today I made git-annex handle files that are still open for write after a kqueue creation event is received. Unlike with inotify, which has a new event each time a file is closed, kqueue only gets one event when a file is first created, and so git-annex needs to retry adding files until there are no writers left.

Eventually I found an elegant way to do that. The committer thread already wakes up every second as long as there's a pending change to commit. So for adds that need to be retried, it can just push them back onto the change queue, and the committer thread will wait one second and retry the add. One second might be too frequent to check, but it will do for now.

This means that git annex watch should now be usable on OSX, FreeBSD, and NetBSD! (It'll also work on Debian kFreeBSD once lsof is ported to it.) I've meged kqueue support to master.

I also think I've squashed the empty commits that were sometimes made.

Incidentally, I'm 50% through my first month, and finishing inotify was the first half of my roadmap for this month. Seem to be right on schedule.. Now I need to start thinking about syncing.

Posted Wed Oct 10 15:36:01 2012

Almost done with the data transfer code.. Today I filled in some bits and peices.

Made the expensive transfer scan handle multiple remotes in one pass. So on startup, it only runs once, not N times. And when reconnecting to the network, when a remote has changed, it scans all network remotes in one pass, rather than making M redundant passes.

Got syncing with special remotes all working. Pretty easy actually. Just had to avoid doing any git repo push/pull with them, while still queueing data transfers.

It'll even download anything it can from the web special remote. To support that, I added generic support for readonly remotes; it'll only download from those and not try to upload to them.

(Oh, and I properly fixed the nasty GIT_INDEX_FILE environment variable problem I had the other day.)

I feel I'm very close to being able to merge the assistant branch into master now. I'm reasonably confident the data transfer code will work well now, and manage to get things in sync eventually in all circumstances. (Unless there are bugs.) All the other core functionality of the assistant and webapp is working. The only think I might delay because of is the missing progressbars in the webapp .. but that's a silly thing to block the merge on.

Still, I might spend a day and get a dumb implementation of progress bars for downloads working first (progress bars for uploads are probably rather harder). I'd spend longer on progress bars, but there are so many more exciting things I'm now ready to develop, like automatic configurators for using your git annex with Amazon S3, rsync.net, and the computer across the room..!

Posted Wed Oct 10 15:36:01 2012

Alerts can now have buttons, that go to some url when clicked. Yay.

Implementing that was a PITA, because Yesod really only wants its type-safe urls to be rendered from within its Handler monad. Which most things that create alerts are not. I managed to work around Yesod's insistence on this only by using a MVar to store the pure function that Yesod uses internally. That function can only be obtained once the webapp is running.


Fixed a nasty bug where using gpg would cause hangs. I introduced this back when I was reworking all the code in git-annex that runs processes, so it would work with threading. In the process a place that had forked a process to feed input to gpg was lost. Fixed it by spawning a thread to feed gpg. Luckily I have never released a version of git-annex with that bug, but the many users who are building from the master branch should update.


Made alerts be displayed while pairing is going on, with buttons to cancel pairing or respond to a pairing request.

Posted Wed Oct 10 15:36:01 2012

Beating my head against the threaded runtime some more. I can reproduce one of the hangs consistently by running 1000 git annex add commands in a loop. It hangs around 1% of the time, reading from git cat-file.

Interestingly, git cat-file is not yet running at this point -- git-annex has forked a child process, but the child has not yet exec'd it. Stracing the child git-annex, I see it stuck in a futex. Adding tracing, I see the child never manages to run any code at all.

This really looks like the problem is once again in MissingH, which uses forkProcess. Which happens to come with a big warning about being very unsafe, in very subtle ways. Looking at the C code that the newer process library uses when sparning a pipe to a process, it messes around with lots of things; blocking signals, stopping a timer, etc. Hundreds of lines of C code to safely start a child process, all doing things that MissingH omits.

That's the second time I've seemingly isolated a hang in the GHC threaded runtime to MissingH.

And so I've started converting git-annex to use the new process library, for running all its external commands. John Goerzen had mentioned process to me once before when I found a nasty bug in MissingH, as the cool new thing that would probably eliminate the System.Cmd.Utils part of MissingH, but I'd not otherwise heard much about it. (It also seems to have the benefit of supporting Windows.)

This is a big change and it's early days, but each time I see a hang, I'm converting the code to use process, and so far the hangs have just gone away when I do that.


Hours later... I've converted all of git-annex to use process.

In the er, process, the --debug switch stopped printing all the commands it runs. I may try to restore that later.

I've not tested everything, but the test suite passes, even when using the threaded runtime. MILESTONE

Looking forward to getting out of these weeds and back to useful work..


Hours later yet.... The assistant branch in git now uses the threaded runtime. It works beautifully, using proper threads to run file transfers in.

That should fix the problem I was seeing on OSX yesterday. Too tired to test it now.

--

Amazingly, all the assistant's own dozen or so threads and thread synch variables etc all work great under the threaded runtime. I had assumed I'd see yet more concurrency problems there when switching to it, but it all looks good. (Or whatever problems there are are subtle ones?)

I'm very relieved. The threaded logjam is broken! I had been getting increasingly worried that not having the threaded runtime available would make it very difficult to make the assistant perform really well, and cause problems with the webapp, perhaps preventing me from using Yesod.

Now it looks like smooth sailing ahead. Still some hard problems, but it feels like with inotify and kqueue and the threaded runtime all dealt with, the really hard infrastructure-level problems are behind me.

Posted Wed Oct 10 15:36:01 2012

Pondering syncing today. I will be doing syncing of the git repository first, and working on syncing of file data later.

The former seems straightforward enough, since we just want to push all changes to everywhere. Indeed, git-annex already has a sync command that uses a smart technique to allow syncing between clones without a central bare repository. (Props to Joachim Breitner for that.)

But it's not all easy. Syncing should happen as fast as possible, so changes show up without delay. Eventually it'll need to support syncing between nodes that cannot directly contact one-another. Syncing needs to deal with nodes coming and going; one example of that is a USB drive being plugged in, which should immediately be synced, but network can also come and go, so it should periodically retry nodes it failed to sync with. To start with, I'll be focusing on fast syncing between directly connected nodes, but I have to keep this wider problem space in mind.

One problem with git annex sync is that it has to be run in both clones in order for changes to fully propagate. This is because git doesn't allow pushing changes into a non-bare repository; so instead it drops off a new branch in .git/refs/remotes/$foo/synced/master. Then when it's run locally it merges that new branch into master.

So, how to trigger a clone to run git annex sync when syncing to it? Well, I just realized I have spent two weeks developing something that can be repurposed to do that! Inotify can watch for changes to .git/refs/remotes, and the instant a change is made, the local sync process can be started. This avoids needing to make another ssh connection to trigger the sync, so is faster and allows the data to be transferred over another protocol than ssh, which may come in handy later.

So, in summary, here's what will happen when a new file is created:

  1. inotify event causes the file to be added to the annex, and immediately committed.
  2. new branch is pushed to remotes (probably in parallel)
  3. remotes notice new sync branch and merge it
  4. (data sync, TBD later)
  5. file is fully synced and available

Steps 1, 2, and 3 should all be able to be accomplished in under a second. The speed of git push making a ssh connection will be the main limit to making it fast. (Perhaps I should also reuse git-annex's existing ssh connection caching code?)

Posted Wed Oct 10 15:36:01 2012

Nothing flashy today; I was up all night trying to download photos taken by a robot lowered onto Mars by a skycrane.

Some work on alerts. Added an alert when a file transfer succeeds or fails. Improved the alert combining code so it handles those alerts, and simplified it a lot, and made it more efficient.

Also made the text of action alerts change from present to past tense when the action finishes. To support that I wrote a fun data type, a TenseString that can be rendered in either tense.

Posted Wed Oct 10 15:36:01 2012

Amazon S3 was the second most popular choice in the prioritizing special remotes poll, and since I'm not sure how I want to support phone/mp3 players, I did it first.

So I added a configurator today to easily set up an Amazon S3 repository. That was straightforward and didn't take long since git-annex already supported S3.

The hard part, of course, is key distribution. Since the webapp so far can only configure the shared encryption method, and not fullblown gpg keys, I didn't feel it would be secure to store the S3 keys in the git repository. Anyone with access to that git repo would have full access to S3 ... just not acceptable. Instead, the webapp stores the keys in a 600 mode file locally, and they're not distributed at all.

When the same S3 repository is enabled on another computer, it prompts for keys then too. I did add a hint about using the IAM Management Console in this case -- it should be possible to set up users in IAM who can only access a single bucket, although I have not tried to set that up.


Also, more work on the standalone OSX app.

Posted Wed Oct 10 15:36:01 2012

Implemented deferred downloads. So my example from yesterday of three repositories in a line keep fully in sync now!

I punted on one problem while doing it. It might be possible to get a really big list of deferred downloads in some situation. That all lives in memory. I aim for git-annex to always have a constant upper bound on memory use, so that's not really acceptable. I have TODOed a reminder to do something about limiting the size of this list.


I also ran into a nasty crash while implementing this, where two threads were trying to do things to git HEAD at the same time, and so one crashed, and in a way I don't entirely understand, that crash took down another thread with a BlockedIndefinitelyOnSTM exception. I think I've fixed this, but it's bothersome that this is the second time that modifications to the Merger thread have led to a concurrency related crash that I have not fully understood.

My guess is that STM can get confused when it's retrying, and the thread that was preventing it from completing a transaction crashes, because it suddenly does not see any other references to the TVar(s) involved in the transaction. Any GHC STM gurus out there?


Still work to be done on making data transfers to keep fully in sync in all circumstances. One case I've realized needs work occurs when a USB drive is plugged in. Files are downloaded from it to keep the repo in sync, but the repo neglects to queue uploads of those files it just got out to other repositories it's in contact with. Seems I still need to do something to detecting when a successful download is done, and queue uploads.

Posted Wed Oct 10 15:36:01 2012

In a series of airport layovers all day. Since I woke up at 3:45 am, didn't feel up to doing serious new work, so instead I worked through some OSX support backlog.

git-annex will now use Haskell's SHA library if the sha256sum command is not available. That library is slow, but it's guaranteed to be available; git-annex already depended on it to calculate HMACs.

Then I decided to see if it makes sense to use the SHA library when adding smaller files. At some point, its slower implementation should win over needing to fork and parse the output of sha256sum. This was the first time I tried out Haskell's Criterion benchmarker, and I built this simple benchmark in short order.

[[!format haskell """ import Data.Digest.Pure.SHA import Data.ByteString.Lazy as L import Criterion.Main import Common

testfile :: FilePath testfile = "/tmp/bar" -- on ram disk

main = defaultMain [ bgroup "sha256" [ bench "internal" $ whnfIO internal , bench "external" $ whnfIO external ] ]

internal :: IO String internal = showDigest . sha256 <$> L.readFile testfile

external :: IO String external = pOpen ReadFromPipe "sha256sum" [testfile] $ \h -> fst . separate (== ' ') <$> hGetLine h """]]

The nice thing about benchmarking in Airports is when you're running a benchmark locally, you don't want to do anything else with the computer, so can alternate people watching, spacing out, and analizing results.

100 kb file:

benchmarking sha256/internal
mean: 15.64729 ms, lb 15.29590 ms, ub 16.10119 ms, ci 0.950
std dev: 2.032476 ms, lb 1.638016 ms, ub 2.527089 ms, ci 0.950

benchmarking sha256/external
mean: 8.217700 ms, lb 7.931324 ms, ub 8.568805 ms, ci 0.950
std dev: 1.614786 ms, lb 1.357791 ms, ub 2.009682 ms, ci 0.950

75 kb file:

benchmarking sha256/internal
mean: 12.16099 ms, lb 11.89566 ms, ub 12.50317 ms, ci 0.950
std dev: 1.531108 ms, lb 1.232353 ms, ub 1.929141 ms, ci 0.950

benchmarking sha256/external
mean: 8.818731 ms, lb 8.425744 ms, ub 9.269550 ms, ci 0.950
std dev: 2.158530 ms, lb 1.916067 ms, ub 2.487242 ms, ci 0.950

50 kb file:

benchmarking sha256/internal
mean: 7.699274 ms, lb 7.560254 ms, ub 7.876605 ms, ci 0.950
std dev: 801.5292 us, lb 655.3344 us, ub 990.4117 us, ci 0.950

benchmarking sha256/external
mean: 8.715779 ms, lb 8.330540 ms, ub 9.102232 ms, ci 0.950
std dev: 1.988089 ms, lb 1.821582 ms, ub 2.181676 ms, ci 0.950

10 kb file:

benchmarking sha256/internal
mean: 1.586105 ms, lb 1.574512 ms, ub 1.604922 ms, ci 0.950
std dev: 74.07235 us, lb 51.71688 us, ub 108.1348 us, ci 0.950

benchmarking sha256/external
mean: 6.873742 ms, lb 6.582765 ms, ub 7.252911 ms, ci 0.950
std dev: 1.689662 ms, lb 1.346310 ms, ub 2.640399 ms, ci 0.950

It's possible to get nice graphical reports out of Criterion, but this is clear enough, so I stopped here. 50 kb seems a reasonable cutoff point.

I also used this to benchmark the SHA256 in Haskell's Crypto package. Surprisingly, it's a lot slower than even the Pure.SHA code. On a 50 kb file:

benchmarking sha256/Crypto
collecting 100 samples, 1 iterations each, in estimated 6.073809 s
mean: 69.89037 ms, lb 69.15831 ms, ub 70.71845 ms, ci 0.950
std dev: 3.995397 ms, lb 3.435775 ms, ub 4.721952 ms, ci 0.950

There's another Haskell library, SHA2, which I should try some time.

Posted Wed Oct 10 15:36:01 2012

Woke up this morning with most of the design for a smarter approach to syncing in my head. (This is why I sometimes slip up and tell people I work on this project 12 hours a day..)

To keep the current assistant branch working while I make changes that break use cases that are working, I've started developing in a new branch, assistant-wip.

In it, I've started getting rid of unnecessary expensive transfer scans.

First optimisation I've done is to detect when a remote that was disconnected has diverged its git-annex branch from the local branch. Only when that's the case does a new transfer scan need to be done, to find out what new stuff might be available on that remote, to have caused the change to its branch, while it was disconnected.

That broke a lot of stuff. I have a plan to fix it written down in syncing. It'll involve keeping track of whether a transfer scan has ever been done (if not, one should be run), and recording logs when transfers failed, so those failed transfers can be retried when the remote gets reconnected.

Posted Wed Oct 10 15:36:01 2012

Started reading about ZeroMQ with the hope that it could do some firewall traversal thing, to connect mutually-unroutable nodes. Well, it could, but it'd need a proxy to run on a server both can contact, and lots of users won't have a server to run that on. The XMPP approach used by dvcs-autosync is looking like the likeliest way for git-annex to handle that use case.

However, ZeroMQ did point in promising directions to handle another use case I need to support: Local pairing. In fairly short order, I got ZeroMQ working over IP Multicast (PGM), with multiple publishers sending messages that were all seen by multiple clients on the LAN (actually the WAN; works over OpenVPN too). I had been thinking about using Avahi/ZeroConf for discovery of systems to pair with, but ZeroMQ is rather more portable and easy to work with.

Unfortunatly, I wasn't able to get ZeroMQ to behave reliably enough. It seems to have some timeout issues the way I'm trying to use it, or perhaps its haskell bindings are buggy? Anyway, it's really overkill to use PGM when all I need for git-annex pairing discovery is lossy UDP Multicast. Haskell has a simple network-multicast library for that, and it works great.

With discovery out of the way (theoretically), the hard part about pairing is going to be verifying that the desired repository is being paired with, and not some imposter. My plan to deal with this involves a shared secret, that can be communicated out of band, and HMAC. The webapp will prompt both parties to enter the same agreed upon secret (which could be any phrase, ideally with 64 bytes of entropy), and will then use it as the key for HMAC on the ssh public key. The digest will be sent over the wire, along with the ssh public key, and the other side can use the shared secret to verifiy the key is correct.

The other hard part about pairing will be finding the best address to use for git, etc to connect to the other host. If MDNS is available, it's ideal, but if not the pair may have to rely on local DNS, or even hard-coded IPs, which will be much less robust. Or, the assistant could broadcast queries for a peer's current IP address itself, as a poor man's MDNS.

All right then! That looks like a good week's worth of work to embark on.


Slight detour to package the haskell network-multicast library and upload to Debian unstable.

Roughed out a data type that models the whole pairing conversation, and can be serialized to implement it. And a state machine to run that conversation. Not yet hooked up to any transport such as multicast UDP.

Posted Wed Oct 10 15:36:01 2012

Random improvements day..

Got the merge conflict resolution code working in git annex assistant.

Did some more fixes to the pushing and pulling code, covering some cases I missed earlier.

Git syncing seems to work well for me now; I've seen it recover from a variety of error conditions, including merge conflicts and repos that were temporarily unavailable.


There is definitely a MVar deadlock if the merger thread's inotify event handler tries to run code in the Annex monad. Luckily, it doesn't currently seem to need to do that, so I have put off debugging what's going on there.

Reworked how the inotify thread runs, to avoid the two inotify threads in the assistant now from both needing to wait for program termination, in a possibly conflicting manner.

Hmm, that seems to have fixed the MVar deadlock problem.


Been thinking about how to fix watcher commits unlocked files. Posted some thoughts there.

It's about time to move on to data syncing. While eventually that will need to build a map of the repo network to efficiently sync data over the fastest paths, I'm thinking that I'll first write a dumb version. So, two more threads:

  1. Uploads new data to every configured remote. Triggered by the watcher thread when it adds content. Easy; just use a TSet of Keys to send.

  2. Downloads new data from the cheapest remote that has it. Could be triggered by the merger thread, after it merges in a git sync. Rather hard; how does it work out what new keys are in the tree without scanning it all? Scan through the git history to find newly created files? Maybe the watcher triggers this thread instead, when it sees a new symlink, without data, appear.

Both threads will need to be able to be stopped, and restarted, as needed to control the data transfer. And a lot of other control smarts will eventually be needed, but my first pass will be to do a straightforward implementation. Once it's done, the git annex assistant will be basically usable.

Posted Wed Oct 10 15:36:01 2012

Since last post, I've worked on speeding up git annex watch's startup time in a large repository.

The problem was that its initial scan was naively staging every symlink in the repository, even though most of them are, presumably, staged correctly already. This was done in case the user copied or moved some symlinks around while git annex watch was not running -- we want to notice and commit such changes at startup.

Since I already had the stat info for the symlink, it can look at the ctime to see if the symlink was made recently, and only stage it if so. This sped up startup in my big repo from longer than I cared to wait (10+ minutes, or half an hour while profiling) to a minute or so. Of course, inotify events are already serviced during startup, so making it scan quickly is really only important so people don't think it's a resource hog. First impressions are important. :)

But what does "made recently" mean exactly? Well, my answer is possibly over engineered, but most of it is really groundwork for things I'll need later anyway. I added a new data structure for tracking the status of the daemon, which is periodically written to disk by another thread (thread #6!) to .git/annex/daemon.status Currently it looks like this; I anticipate adding lots more info as I move into the syncing stage:

lastRunning:1339610482.47928s
scanComplete:True

So, only symlinks made after the daemon was last running need to be expensively staged on startup. Although, as RichiH pointed out, this fails if the clock is changed. But I have been planning to have a cleanup thread anyway, that will handle this, and other potential problems, so I think that's ok.

Stracing its startup scan, it's fairly tight now. There are some repeated getcwd syscalls that could be optimised out for a minor speedup.


Added the sanity check thread. Thread #7! It currently only does one sanity check per day, but the sanity check is a fairly lightweight job, so I may make it run more frequently. OTOH, it may never ever find a problem, so once per day seems a good compromise.

Currently it's only checking that all files in the tree are properly staged in git. I might make it git annex fsck later, but fscking the whole tree once per day is a bit much. Perhaps it should only fsck a few files per day? TBD

Currently any problems found in the sanity check are just fixed and logged. It would be good to do something about getting problems that might indicate bugs fed back to me, in a privacy-respecting way. TBD


I also refactored the code, which was getting far too large to all be in one module.

I have been thinking about renaming git annex watch to git annex assistant, but I think I'll leave the command name as-is. Some users might want a simple watcher and stager, without the assistant's other features like syncing and the webapp. So the next stage of the roadmap will be a different command that also runs watch.

At this point, I feel I'm done with the first phase of inotify. It has a couple known bugs, but it's ready for brave beta testers to try. I trust it enough to be running it on my live data.

Posted Wed Oct 10 15:36:01 2012

It's possible for one git annex repository to configure a special remote that it makes sense for other repositories to also be able to use. Today I added the UI to support that; in the list of repositories, such repositories have a "enable" link.

To enable pre-existing rsync special remotes, the webapp has to do the same probing and ssh key setup that it does when initially creating them. Rsync.net is also handled as a special case in that code. There was one ugly part to this.. When a rsync remote is configured in the webapp, it uses a mangled hostname like "git-annex-example.com-user", to make ssh use the key it sets up. That gets stored in the remote.log, and so the enabling code has to unmangle it to get back to the real hostname.


Based on the still-running prioritizing special remotes poll, a lot of people want special remote support for their phone or mp3 player. (As opposed to running git-annex on an Android phone, which comes later..) It'd be easy enough to make the webapp set up a directory special remote on such a device, but that makes consuming some types of content on the phone difficult (mp3 players seem to handle them ok based on what people tell me). I need to think more about some of the ideas mentioned in android for more suitable ways of storing files.

One thing's for sure: You won't want the assistant to sync all your files to your phone! So I also need to start coming up with partial syncing controls. One idea is for each remote to have a configurable matcher for files it likes to receive. That could be only mp3 files, or all files inside a given subdirectory, or all files not in a given subdirectory. That means that when the assistant detects a file has been moved, it'll need to add (or remove) a queued transfer. Lots of other things could be matched on, like file size, number of copies, etc. Oh look, I have a beautiful library I wrote earlier that I can reuse!

Posted Wed Oct 10 15:36:01 2012

Milestone: I can run git annex assistant, plug in a USB drive, and it automatically transfers files to get the USB drive and current repo back in sync.

I decided to implement the naive scan, to find files needing to be transferred. So it walks through git ls-files and checks each file in turn. I've deferred less expensive, more sophisticated approaches to later.

I did some work on the TransferQueue, which now keeps track of the length of the queue, and can block attempts to add Transfers to it if it gets too long. This was a nice use of STM, which let me implement that without using any locking.

[[!format haskell """ atomically $ do sz <- readTVar (queuesize q) if sz <= wantsz then enqueue schedule q t (stubInfo f remote) else retry -- blocks until queuesize changes """]]

Anyway, the point was that, as the scan finds Transfers to do, it doesn't build up a really long TransferQueue, but instead is blocked from running further until some of the files get transferred. The resulting interleaving of the scan thread with transfer threads means that transfers start fairly quickly upon a USB drive being plugged in, and kind of hides the innefficiencies of the scanner, which will most of the time be swamped out by the IO bound large data transfers.


At this point, the assistant should do a good job of keeping repositories in sync, as long as they're all interconnected, or on removable media like USB drives. There's lots more work to be done to handle use cases where repositories are not well-connected, but since the assistant's syncing now covers at least a couple of use cases, I'm ready to move on to the next phase. Webapp, here we come!

Posted Wed Oct 10 15:36:01 2012

Well, sometimes you just have to go for the hack. Trying to find a way to add additional options to git-annex-shell without breaking backwards compatibility, I noticed that it ignores all options after --, because those tend to be random rsync options due to the way rsync runs it.

So, I've added a new class of options, that come in between, like -- opt=val opt=val ... --

The parser for these will not choke on unknown options, unlike normal getopt. So this let me add the additional info I needed to pass to git-annex-shell to make it record transfer information. And if I need to pass more info in the future, that's covered too.

It's ugly, but since only git-annex runs git-annex-shell, this is an ugliness only I (and now you, dear reader) have to put up with.

Note to self: Command-line programs are sometimes an API, particularly if designed to be called remotely, and so it makes sense consider whether they are, and design expandability into them from day 1.


Anyway, we now have full transfer tracking in git-annex! Both sides of a transfer know what's being transferred, and from where, and have the info necessary to interrupt the transfer.


Also did some basic groundwork, adding a queue of transfers to perform, and adding to the daemon's status information a map of currently running transfers.

Next up: The daemon will use inotify to notice new and deleted transfer info files, and update its status info.

Posted Wed Oct 10 15:36:01 2012

Good news! My beta testers report that the new kqueue code works on OSX. At least "works" as well as it does on Debian kFreeBSD. My crazy development strategy of developing on Debian kFreeBSD while targeting Mac OSX is vindicated. ;-)

So, I've been beating the kqueue code into shape for the last 12 hours, minus a few hours sleep.

First, I noticed it was seeming to starve the other threads. I'm using Haskell's non-threaded runtime, which does cooperative multitasking between threads, and my C code was never returning to let the other threads run. Changed that around, so the C code runs until SIGALARMed, and then that thread calls yield before looping back into the C code. Wow, cooperative multitasking.. I last dealt with that when programming for Windows 3.1! (Should try to use Haskell's -threaded runtime sometime, but git-annex doesn't work under it, and I have not tried to figure out why not.)

Then I made a single commit, with no testing, in which I made the kqueue code maintain a cache of what it expects in the directory tree, and use that to determine what files changed how when a change is detected. Serious code. It worked on the first go. If you were wondering why I'm writing in Haskell ... yeah, that's why.

And I've continued to hammer on the kqueue code, making lots of little fixes, and at this point it seems almost able to handle the changes I throw at it. It does have one big remaining problem; kqueue doesn't tell me when a writer closes a file, so it will sometimes miss adding files. To fix this, I'm going to need to make it maintain a queue of new files, and periodically check them, with lsof, to see when they're done being written to, and add them to the annex. So while a file is being written to, git annex watch will have to wake up every second or so, and run lsof ... and it'll take it at least 1 second to notice a file's complete. Not ideal, but the best that can be managed with kqueue.

Posted Wed Oct 10 15:36:01 2012

As I prepare to dive back into development, now is a good time to review what I've built so far, and how well I'm keeping up with my planned roadmap.

I started working two and a half months ago, so am nearing the end of the three months I originally asked to be funded for on Kickstarter.

I've built much of what I planned to build in the first three months -- inotify is done (and kqueue is basically working, but needs scalability work), local syncing is done, the webapp works, and I've built some of the first configurators. It's all functional in a narrow use case involving syncing to removable drives.

progressbars still need to be dealt with, and network syncing needs to be revisited soon, so that I can start building easy configurators for further use cases, like using the cloud, or another machine on the local network.

I think I'm a little behind my original schedule, but not too bad, and at the same time, I think I've built things rather more solidly than I expected them to be at this point. I'm particularly happy with how well the inotify code works, no matter what is thrown at it, and how nice the UI in the webapp is shaping up to be.


I also need to get started on fulfilling my Kickstarter rewards, and I was happy to spend some time in the airport working on the main blocker toward that, a lack of a scalable git-annex logo, which is needed for printing on swag.

Turns out that inkscape has some amazing bitmap tracing capabilities. I was able to come up with this scalable logo in short order, it actually took longer to add back the colors, as the tracer generated a black and white version.

With that roadblock out of the way, I am moving toward ordering large quantities of usb drives, etc.

logo.svg

Posted Wed Oct 10 15:36:01 2012

After a few days otherwise engaged, back to work today.

My focus was on adding the committing thread mentioned in day 4 speed. I got rather further than expected!

First, I implemented a really dumb thread, that woke up once per second, checked if any changes had been made, and committed them. Of course, this rather sucked. In the middle of a large operation like untarring a tarball, or rm -r of a large directory tree, it made lots of commits and made things slow and ugly. This was not unexpected.

So next, I added some smarts to it. First, I wanted to stop it waking up every second when there was nothing to do, and instead blocking wait on a change occurring. Secondly, I wanted it to know when past changes happened, so it could detect batch mode scenarios, and avoid committing too frequently.

I played around with combinations of various Haskell thread communications tools to get that information to the committer thread: MVar, Chan, QSem, QSemN. Eventually, I realized all I needed was a simple channel through which the timestamps of changes could be sent. However, Chan wasn't quite suitable, and I had to add a dependency on Software Transactional Memory, and use a TChan. Now I'm cooking with gas!

With that data channel available to the committer thread, it quickly got some very nice smart behavior. Playing around with it, I find it commits instantly when I'm making some random change that I'd want the git-annex assistant to sync out instantly; and that its batch job detection works pretty well too.

There's surely room for improvement, and I made this part of the code be an entirely pure function, so it's really easy to change the strategy. This part of the committer thread is so nice and clean, that here's the current code, for your viewing pleasure:

[[!format haskell """ {- Decide if now is a good time to make a commit.

- Note that the list of change times has an undefined order.

  • Current strategy: If there have been 10 commits within the past second,
  • a batch activity is taking place, so wait for later. -} shouldCommit :: UTCTime -> [UTCTime] -> Bool shouldCommit now changetimes | len == 0 = False | len > 4096 = True -- avoid bloating queue too much | length (filter thisSecond changetimes) < 10 = True | otherwise = False -- batch activity where len = length changetimes thisSecond t = now diffUTCTime t <= 1 """]]

Still some polishing to do to eliminate minor inefficiencies and deal with more races, but this part of the git-annex assistant is now very usable, and will be going out to my beta testers soon!

Posted Wed Oct 10 15:36:01 2012

Decided to only make bare git repos on remote ssh servers. This configurator is aimed at using a server somewhere, which is probably not going to be running the assistant. So it doesn't need a non-bare repo, and there's nothing to keep the checked out branch in a non-bare repo up-to-date on such a server, anyway. For non-bare repos on locally accessible boxes, the pairing configurator will be the thing to use, instead of this one.

Note: While the remote ssh configurator works great, and you could even have the assistant running on multiple computers and use it to point them all at the same repo on a server, the assistant does not yet support keeping such a network topology in sync. That needs some of the ideas in cloud to happen, so clients can somehow inform each other when there are changes. Until that happens, the assistant polls only every 30 minutes, so it'll keep in sync with a 30 minute delay.


This configurator can also set up encryped rsync special remotes. Currently it always encrypts them, using the shared cipher mode of git-annex's encryption. That avoids issues with gpg key generation and distribution, and was easy to get working.


I feel I'm in a good place now WRT adding repository configurator wizards to the webapp. This one took about 2.5 days, and involved laying some groundwork that will be useful for other repository configurators. And it was probably one of the more complex ones.

Now I should be able to crank out configurators for things like Amazon S3, Bup, Rsync.net, etc fairly quickly. First, I need to do a beta release of the assistant, and start getting feedback from my backers to prioritize what to work on.

Posted Wed Oct 10 15:36:01 2012
git merge watch_

My cursor has been mentally poised here all day, but I've been reluctant to merge watch into master. It seems solid, but is it correct? I was able to think up a lot of races it'd be subject to, and deal with them, but did I find them all?

Perhaps I need to do some automated fuzz testing to reassure myself. I looked into using genbackupdata to that end. It's not quite what I need, but could be moved in that direction. Or I could write my own fuzz tester, but it seems better to use someone else's, because a) laziness and b) they're less likely to have the same blind spots I do.

My reluctance to merge isn't helped by the known bugs with files that are either already open before git annex watch starts, or are opened by two processes at once, and confuse it into annexing the still-open file when one process closes it.

I've been thinking about just running lsof on every file as it's being annexed to check for that, but in the end, lsof is too slow. Since its check involves trawling through all of /proc, it takes it a good half a second to check a file, and adding 25 seconds to the time it takes to process 100 files is just not acceptable.

But an option that could work is to run lsof after a bunch of new files have been annexed. It can check a lot of files nearly as fast as a single one. In the rare case that an annexed file is indeed still open, it could be moved back out of the annex. Then when its remaining writer finally closes it, another inotify event would re-annex it.

Posted Wed Oct 10 15:36:01 2012

Today I revisited something from way back in day 7 bugfixes. Back then, it wasn't practical to run git ls-files on every file the watcher noticed, to check if it was already in git. Revisiting this, I found I could efficiently do that check at the same point it checks lsof. When there's a lot of files being added, they're batched up at that point, so it won't be calling git ls-files repeatedly.

Result: It's safe to mix use of the assistant with files stored in git in the normal way. And it's safe to mix use of git annex unlock with the assistant; it won't immediately re-lock files. Yay!


Also fixed a crash in the committer, and made git annex status display repository groups.


Been thinking through where to store the transfer control expressions. Since repositories need to know about the transfer controls of other remotes, storing them in .git/config isn't right. I thought it might be nice to configure the expressions in .gitattributes, but it seems the file format doesn't allow complicated multi-word attributes. Instead, they'll be stored in the git-annex branch.

Posted Wed Oct 10 15:36:01 2012

Started work on the interface displayed when the webapp is started with no existing git-annex repository. All this needs to do is walk the user through setting up a repository, as simply as possible.

A tricky part of this is that most of git-annex runs in the Annex monad, which requires a git-annex repository. Luckily, much of the webapp does not run in Annex, and it was pretty easy to work around the parts that do. Dodged a bullet there.

There will, however, be a tricky transition from this first run webapp, to a normally fully running git-annex assistant and webapp. I think the first webapp will have to start up all the normal threads once it makes the repository, and then redirect the user's web browser to the full webapp.

Anyway, the UI I've made is very simple: A single prompt, for the directory where the repository should go. With, eventually, tab completion, sanity checking (putting the repository in "/" is not good, and making it all of "$HOME" is probably unwise).

Ideally most users will accept the default, which will be something like /home/username/Desktop/Annex, and be through this step in seconds.

Suggestions for a good default directory name appreciated.. Putting it on a folder that will appear on the desktop seems like a good idea, when there's a Desktop directory. I'm unsure if I should name it something specific like "GitAnnex", or something generic like "Synced".

Time for the first of probably many polls!

What should the default directory name used by the git-annex assistant be?

[[!poll open=no 19 "Annex" 7 "GitAnnex" 1 "~/git-annex/" 10 "Synced" 0 "AutoSynced" 1 "Shared" 10 "something lowercase!" 1 "CowboyNeal" 1 "Annexbox"]]

(Note: This is a wiki. You can edit this page to add your own poll options!)

Posted Wed Oct 10 15:36:01 2012 Tags:

A rather frustrating and long day coding went like this:

1-3 pm

Wrote a single function, of which all any Haskell programmer needs to know is its type signature:

Lsof.queryDir :: FilePath -> IO [(FilePath, LsofOpenMode, ProcessInfo)]

When I'm spending another hour or two taking a unix utility like lsof and parsing its output, which in this case is in a rather complicated machine-parsable output format, I often wish unix streams were strongly typed, which would avoid this bother.

3-9 pm

Six hours spent making it defer annexing files until the commit thread wakes up and is about to make a commit. Why did it take so horribly long? Well, there were a number of complications, and some really bad bugs involving races that were hard to reproduce reliably enough to deal with.

In other words, I was lost in the weeds for a lot of those hours...

At one point, something glorious happened, and it was always making exactly one commit for batch mode modifications of a lot of files (like untarring them). Unfortunately, I had to lose that gloriousness due to another potential race, which, while unlikely, would have made the program deadlock if it happened.

So, it's back to making 2 or 3 commits per batch mode change. I also have a buglet that causes sometimes a second empty commit after a file is added. I know why (the inotify event for the symlink gets in late, after the commit); will try to improve commit frequency later.

9-11 pm

Put the capstone on the day's work, by calling lsof on a directory full of hardlinks to the files that are about to be annexed, to check if any are still open for write.

This works great! Starting up git annex watch when processes have files open is no longer a problem, and even if you're evil enough to try having multiple processes open the same file, it will complain and not annex it until all the writers close it.

(Well, someone really evil could turn the write bit back on after git annex clears it, and open the file again, but then really evil people can do that to files in .git/annex/objects too, and they'll get their just deserts when git annex fsck runs. So, that's ok..)


Anyway, will beat on it more tomorrow, and if all is well, this will finally go out to the beta testers.

Posted Wed Oct 10 15:36:01 2012

Back home and laptop is fixed.. back to work.

Warmup exercises:

  • Went in to make it queue transfers when a broken symlink is received, only to find I'd already written code to do that, and forgotten about it. Heh. Did check that the git-annex branch is always sent first, which will ensure that code always knows where to transfer a key from. I had probably not considered this wrinkle when first writing the code; it worked by accident.

  • Made the assistant check that a remote is known to have a key before queueing a download from it.

  • Fixed a bad interaction between the git annex map command and the assistant.


Tried using a modified version of MissingH that doesn't use HSLogger to make git-annex work with the threaded GHC runtime. Unfortunatly, I am still seeing hangs in at least 3 separate code paths when running the test suite. I may have managed to fix one of the hangs, but have not grokked what's causing the others.


I now have access to a Mac OSX system, thanks to Kevin M. I've fixed some portability problems in git-annex with it before, but today I tested the assistant on it:

  • Found a problem with the kqueue code that prevents incoming pushes from being noticed.

    The problem was that the newly added git ref file does not trigger an add event. The kqueue code saw a generic change event for the refs directory, but since the old file was being deleted and replaced by the new file, the kqueue code, which already had the old file in its cache, did not notice the file had been replaced.

    I fixed that by making the kqueue code also track the inode of each file. Currently that adds the overhead of a stat of each file, which could be avoided if haskell exposed the inode returned by readdir. Room to optimise this later...

  • Also noticed that the kqueue code was not separating out file deletions from directory deletions. IIRC Jimmy had once mentioned a problem with file deletions not being noticed by the assistant, and this could be responsible for that, although the directory deletion code seems to handle them ok normally. It was making the transfer watching thread not notice when any transfers finished, for sure. I fixed this oversight, looking in the cache to see if there used to be a file or a directory, and running the appropriate hook.

Even with these fixes, the assistant does not yet reliably transfer file contents on OSX. I think the problem is that with kqueue we're not guaranteed to get an add event, and a deletion event for a transfer info file -- if it's created and quickly deleted, the code that synthensizes those events doesn't run in time to know it existed. Since the transfer code relies on deletion events to tell when transfers are complete, it stops sending files after the first transfer, if the transfer ran so quickly it doesn't get the expected events.

So, will need to work on OSX support some more...

Posted Wed Oct 10 15:36:01 2012

Now finished building a special configurator for rsync.net. While this is just a rsync remote to git-annex, there are some tricky bits to setting up the ssh key using rsync.net's restricted shell. The configurator automates that nicely. It took about 3 hours of work, and 49 lines of rsync.net specific code to build this.

Thanks to rsync.net who heard of my Kickstarter and gave me a really nice free lifetime account. BTW guys, I wish your restricted shell supported '&&' in between commands, and returned a nonzero exit status when the command fails. This would make my error handling work better.

I've also reworked the repository management page. Nice to see those configurators start to fill in!

Posted Wed Oct 10 15:36:01 2012

Today is a planning day. I have only a few days left before I'm off to Nicaragua for DebConf, where I'll only have smaller chunks of time without interruptions. So it's important to get some well-defined smallish chunks designed that I can work on later. See bulleted action items below (now moved to syncing. Each should be around 1-2 hours unless it turns out to be 8 hours... :)

First, worked on writing down a design, and some data types, for data transfer tracking (see syncing page). Found that writing down these simple data types before I started slinging code has clarified things a lot for me.

Most importantly, I realized that I will need to modify git-annex-shell to record on disk what transfers it's doing, so the assistant can get that information and use it to both avoid redundant transfers (potentially a big problem!), and later to allow the user to control them using the web app.

While eventually the user will be able to use the web app to prioritize transfers, stop and start, throttle, etc, it's important to get the default behavior right. So I'm thinking about things like how to prioritize uploads vs downloads, when it's appropriate to have multiple downloads running at once, etc.

Posted Wed Oct 10 15:36:01 2012

Worked on automatic merge conflict resolution today. I had expected to be able to use git's merge driver interface for this, but that interface is not sufficient. There are two problems with it:

  1. The merge program is run when git is in the middle of an operation that locks the index. So it cannot delete or stage files. I need to do both as part of my conflict resolution strategy.
  2. The merge program is not run at all when the merge conflict is caused by one side deleting a file, and the other side modifying it. This is an important case to handle.

So, instead, git-annex will use a regular git merge, and if it fails, it will fix up the conflicts.

That presented its own difficulty, of finding which files in the tree conflict. git ls-files --unmerged is the way to do that, but its output is a quite raw form:

120000 3594e94c04db171e2767224db355f514b13715c5 1   foo
120000 35ec3b9d7586b46c0fd3450ba21e30ef666cfcd6 3   foo
100644 1eabec834c255a127e2e835dadc2d7733742ed9a 2   bar
100644 36902d4d842a114e8b8912c02d239b2d7059c02b 3   bar

I had to stare at the rather impenetrable documentation for hours and write a lot of parsing and processing code to get from that to these mostly self explanatory data types:

data Conflicting v = Conflicting
        { valUs :: Maybe v
        , valThem :: Maybe v
        } deriving (Show)

data Unmerged = Unmerged
        { unmergedFile :: FilePath
        , unmergedBlobType :: Conflicting BlobType
        , unmergedSha :: Conflicting Sha
        } deriving (Show)

Not the first time I've whined here about time spent parsing unix command output, is it? :)

From there, it was relatively easy to write the actual conflict cleanup code, and make git annex sync use it. Here's how it looks:

$ ls -1
foo.png
bar.png
$ git annex sync
commit  
# On branch master
nothing to commit (working directory clean)
ok
merge synced/master 
CONFLICT (modify/delete): bar.png deleted in refs/heads/synced/master and modified in HEAD. Version HEAD of bar.png left in tree.
Automatic merge failed; fix conflicts and then commit the result.
bar.png: needs merge
(Recording state in git...)
[master 0354a67] git-annex automatic merge conflict fix
ok
$ ls -1
foo.png
bar.variant-a1fe.png
bar.variant-93a1.png

There are very few options for ways for the conflict resolution code to name conflicting variants of files. The conflict resolver can only use data present in git to generate the names, because the same conflict needs to be resolved the same everywhere.

So I had to choose between using the full key name in the filenames produced when resolving a merge, and using a shorter checksum of the key, that would be more user-friendly, but could theoretically collide with another key. I chose the checksum, and weakened it horribly by only using 32 bits of it!

Surprisingly, I think this is a safe choice. The worst that can happens if such a collision happens is another conflict, and the conflict resolution code will work on conflicts produced by the conflict resolution code! In such a case, it does fall back to putting the whole key in the filename: "bar.variant-SHA256-s2550--2c09deac21fa93607be0844fefa870b2878a304a7714684c4cc8f800fda5e16b.png"

Still need to hook this code into git annex assistant.

Posted Wed Oct 10 15:36:01 2012

Just released git-annex 3.20120924, which includes beta versions of the assistant and webapp. Read the errata, then give it a try!

I've uploaded it to Haskell's cabal, and to Debian unstable, and hope my helpers for other distributions will update them soon. (Although the additional dependencies to build the webapp may take a while on some.) I also hope something can be done to make a prebuilt version available on OSX soonish.

I've decided to license the webapp under the AGPL. This should not impact normal users of it, and git-annex can be built without the webapp as a pure GPL licensed program. This is just insurance to prevent someone turning the webapp into a propritary web-only service, by requiring that anyone who does so provide the source of the webapp.

Posted Wed Oct 10 15:36:01 2012

Followed my plan from yesterday, and wrote a simple C library to interface to kqueue, and Haskell code to use that library. By now I think I understand kqueue fairly well -- there are some very tricky parts to the interface.

But... it still didn't work. After building all this, my code was failing the same way that the haskell kqueue library failed yesterday. I filed a bug report with a testcase.

Then I thought to ask on #haskell. Got sorted out in quick order! The problem turns out to be that haskell's runtime has a periodic SIGALARM, that is interrupting my kevent call. It can be worked around with +RTS -V0, but I put in a fix to retry to kevent when it's interrupted.

And now git-annex watch can detect changes to directories on BSD and OSX!

Note: I said "detect", not "do something useful in response to". Getting from the limited kqueue events to actually staging changes in the git repo is going to be another day's work. Still, brave FreeBSD or OSX users might want to check out the watch branch from git and see if git annex watch will at least say it sees changes you make to your repository.

Posted Wed Oct 10 15:36:01 2012

Finally wrapped up progress bars; upload progress is now reported in all situations.

After all that, I was pleased to find a use for the progress info, beyond displaying it to the user. Now the assistant uses it to decide whether it makes sense to immediately retry a failed transfer. This should make it work nicely, or at least better, with flaky network or drives.

The webapp crashed on startup when there was no ~/.gitconfig. Guess all of us who have tried it so far are actual git users, but I'm glad I caught this before releasing the beta.

Jimmy Tang kindly took on making a OS X .app directory for git-annex. So it now has an icon that will launch the webapp.

I'm getting lots of contributors to git-annex all of a sudden. I've had 3 patches this weekend, and 2 of them have been to Haskell code. Justin Azoff is working on incremental fsck, and Robie Basak has gotten Amazon Glacier working using the hook special remote.

Started doing some design for transfer control. I will start work on this after releasing the first beta.

Posted Wed Oct 10 15:36:01 2012

So as not to bury the lead, I've been hard at work on my first day in Nicaragua, and the git-annex assistant fully syncs files (including their contents) between remotes now !!

Details follow..

Made the committer thread queue Upload Transfers when new files are added to the annex. Currently it tries to transfer the new content to every remote; this inefficiency needs to be addressed later.

Made the watcher thread queue Download Transfers when new symlinks appear that point to content we don't have. Typically, that will happen after an automatic merge from a remote. This needs to be improved as it currently adds Transfers from every remote, not just those that have the content.

This was the second place that needed an ordered list of remotes to talk to. So I cached such a list in the DaemonStatus state info. This will also be handy later on, when the webapp is used to add new remotes, so the assistant can know about them immediately.

Added YAT (Yet Another Thread), number 15 or so, the transferrer thread that waits for transfers to be queued and runs them. Currently a naive implementation, it runs one transfer at a time, and does not do anything to recover when a transfer fails.

Actually transferring content requires YAT, so that the transfer action can run in a copy of the Annex monad, without blocking all the assistant's other threads from entering that monad while a transfer is running. This is also necessary to allow multiple concurrent transfers to run in the future.

This is a very tricky piece of code, because that thread will modify the git-annex branch, and its parent thread has to invalidate its cache in order to see any changes the child thread made. Hopefully that's the extent of the complication of doing this. The only reason this was possible at all is that git-annex already support multiple concurrent processes running and all making independent changes to the git-annex branch, etc.

After all my groundwork this week, file content transferring is now fully working!

Posted Wed Oct 10 15:36:01 2012

Made the MountWatcher update state for remotes located in a drive that gets mounted. This was tricky code. First I had to make remotes declare when they're located in a local directory. Then it has to rescan git configs of git remotes (because the git repo mounted at a mount point may change), and update all the state that a newly available remote can affect.

And it works: I plug in a drive containing one of my git remotes, and the assistant automatically notices it and syncs the git repositories.


But, data isn't transferred yet. When a disconnected remote becomes connected, keys should be transferred in both directions to get back into sync.

To that end, added Yet Another Thread; the TransferScanner thread will scan newly available remotes to find keys, and queue low priority transfers to get them fully in sync.

(Later, this will probably also be used for network remotes that become available when moving between networks. I think network-manager sends dbus events it could use..)

This new thread is missing a crucial peice, it doesn't yet have a way to find the keys that need to be transferred. Doing that efficiently (without scanning the whole git working copy) is Hard. I'm considering design possibilities..

Posted Wed Oct 10 15:36:01 2012

I released a version of git-annex over the weekend that includes the git annex watch command. There's a minor issue installing it from cabal on OSX, which I've fixed in my tree. Nice timing: At least the watch command should be shipped in the next Debian release, which freezes at the end of the month.

Jimmy found out how kqueue blows up when there are too many directories to keep all open. I'm not surprised this happens, but it's nice to see exactly how. Odd that it happened to him at just 512 directories; I'd have guessed more. I have plans to fork watcher programs that each watch 512 directories (or whatever the ulimit is), to deal with this. What a pitiful interface is kqueue.. I have not thought yet about how the watcher programs would communicate back to the main program.


Back on the assistant front, I've worked today on making git syncing more robust. Now when a push fails, it tries a pull, and a merge, and repushes. That ensures that the push is, almost always, a fast-forward. Unless something else gets in a push first, anyway!

If a push still fails, there's Yet Another Thread, added today, that will wake up after 30 minutes and retry the push. It currently keeps retrying every 30 minutes until the push finally gets though. This will deal, to some degree, with those situations where a remote is only sometimes available.

I need to refine the code a bit, to avoid it keeping an ever-growing queue of failed pushes, if a remote is just dead. And to clear old failed pushes from the queue when a later push succeeds.

I also need to write a git merge driver that handles conflicts in the tree. If two conflicting versions of a file foo are saved, this would merge them, renaming them to foo.X and foo.Y. Probably X and Y are the git-annex keys for the content of the files; this way all clones will resolve the conflict in a way that leads to the same tree. It's also possible to get a conflict by one repo deleting a file, and another modifying it. In this case, renaming the deleted file to foo.Y may be the right approach, I am not sure.

I glanced through some Haskell dbus bindings today. I belive there are dbus events available to detect when drives are mounted, and on Linux this would let git-annex notice and sync to usb drives, etc.

Posted Wed Oct 10 15:36:01 2012

Today I built the UI in the webapp to set up a ssh or rsync remote.

This is the most generic type of remote, and so it's surely got the most complex description. I've tried to word it as clearly as I can; suggestions most appreciated. Perhaps I should put in a diagram?

The idea is that this will probe the server, using ssh. If git-annex-shell is available there, it'll go on to set up a full git remote. If not, it'll fall back to setting up a rsync special remote. It'll even fall all the way back to using rsync:// protocol if it can't connect by ssh. So the user can just point it at a server and let it take care of the details, generally.

The trickiest part of this will be authentication, of course. I'm relying on ssh using ssh-askpass to prompt for any passwords, etc, when there's no controlling terminal. But beyond passwords, this has to deal with ssh keys.

I'm planning to make it check if you have a ssh key configured already. If you do, it doesn't touch your ssh configuration. I don't want to get in the way of people who have a manual configuration or are using MonkeySphere.

But for a user who has never set up a ssh key, it will prompt asking if they'd like a key to be set up. If so, it'll generate a key and configure ssh to only use it with the server.. and as part of its ssh probe, that key will be added to authorized_keys.

(Obviously, advanced users can skip this entirely; git remote add ssh://... still works..)


Also today, fixed more UI glitches in the transfer display. I think I have them all fixed now, except for the one that needs lots of javascript to be written to fix it.

Amusingly, while I was working on UI glitches, it turned out that all the fixes involved 100% pure code that has nothing to do with UI. The UI was actually just exposing bugs.

For example, closing a running transfer had a bug that weirdly reordered the queue. This turned out to be due to the transfer queue actually maintaining two versions of the queue, one in a TChan and one a list. Some unknown bugs caused these to get out of sync. That was fixed very handily by deleting the TChan, so there's only one copy of the data.

I had only been using that TChan because I wanted a way to block while the queue was empty. But now that I'm more comfortable with STM, I know how to do that easily using a list:

[[!format haskell """ getQueuedTransfer q = atomically $ do sz <- readTVar (queuesize q) if sz < 1 then retry -- blocks until size changes else ... """]]

Ah, the times before STM were dark times indeed. I'm writing more and more STM code lately, building up more and more complicated and useful transactions. If you use threads and don't know about STM, it's a great thing to learn, to get out of the dark ages of dealing with priority inversions, deadlocks, and races.

Posted Wed Oct 10 15:36:01 2012

Got the webapp's progress bars updating for downloads. Updated progressbars with all the options for ways to get progress info. For downloads, it currently uses the easy, and not very expensive, approach of periodically polling the sizes of files that are being downloaded.

For uploads, something more sophisticated will be called for..


The webapp really feels alive now that it has progress bars!

Posted Wed Oct 10 15:36:01 2012

Various bug fixes, and work on the OSX app today:

  • Avoid crashing when ssh-keygen fails due to not being able to parse authorized_keys.. seems a lot of people have crufty unparsable authorized_keys files.
  • On OSX, for some reason the webapp was failing to start sometimes due to bind failing with EINVAL. I don't understand why, as that should only happen if the socket is already bound, which it should not as it's just been created. I was able to work around this by retrying with a new socket when bind fails.
  • When setting up authorized_keys to let git-annex-shell be run, it had been inserting a perl oneliner into it. I changed that to instead call a ~/.ssh/git-annex-shell wrapper script that it sets up. The benefits are it no longer needs perl, and it's less ugly, and the standalone OSX app can modify the wrapper script to point to wherever it's installed today (people like to move these things around I guess).
  • Made the standalone OSX app set up autostarting when it's first run.
  • Spent rather a long time collecting the licenses of all the software that will be bundled with the standalone OSX app. Ended up with a file containing 3954 lines of legalese. Happily, all the software appears redistributable, and free software; even the couple of OSX system libraries we're bundling are licensed under the APSL.
Posted Wed Oct 10 15:36:01 2012

Worked on pairing all day. It's complicated and I was close to being in the weeds at times. I think it probably works now, but I have not tested it at all. Tomorrow, testing, and cleaning up known problems.


Also ordered 1.5 terabytes of USB keys and a thousand git-annex stickers today.

Posted Wed Oct 10 15:36:01 2012

Not a lot of programming today; I spent most of the day stuffing hundreds of envelopes for this Kickstarter thing you may have heard of. Some post office is going to be very surprised with all the international mail soon.


That said, I did write 184 lines of code. (Actually rather a lot, but it was mostly pure functional code, so easy to write.) That pops up your text editor on a file with the the trust and group configurations of repositories, that's stored in the git-annex branch. Handy for both viewing that stuff all in one place, and changing it.

The real reason for doing that is to provide a nice interface for editing transfer control expressions, which I'll be adding next.

Posted Wed Oct 10 15:36:01 2012
  • On OSX, install a launcher plist file, to run the assistant on login, and a git-annex-webapp.command file in the desktop. This is not tested yet.
  • Made the webapp display alerts when the inotify/kqueue layer has a warning message.
  • Handle any crashes of each of the 15 or so named threads by displaying an alert. (Of course, this should never happen.)
Posted Wed Oct 10 15:36:01 2012

More work on the display and control of transfers.

  • Hide redundant downloads from the transfer display. It seemed simplest to keep the behavior of queuing downloads from every remote that has a file, rather than going to some other data structure, but it's clutter to display those to the user, especially when you often have 7 copies of each file, like I do.
  • When canceling a download, cancel all other queued downloads of that key too.
  • Fixed unsettting of the paused flag when resuming a paused transfer.
  • Implemented starting queued transfers by clicking on the start button.
  • Spent a long time debugging why pausing, then resuming, and then pausing a transfer doesn't successfully pause it the second time. I see where the code is seemingly locking up in a throwTo, but I don't understand why that blocks forever. Urgh..
Posted Wed Oct 10 15:36:01 2012

My laptop's SSD died this morning. I had some work from yesterday committed to the git repo on it, but not pushed as it didn't build. Luckily I was able to get that off the SSD, which is now a read-only drive -- even mounting it fails with fsck write errors.

Wish I'd realized the SSD was dying before the day before my trip to Nicaragua.. Getting back to a useful laptop used most of my time and energy today.

I did manage to fix transfers to not block the rest of the assistant's threads. Problem was that, without Haskell's threaded runtime, waiting on something like a rsync command blocks all threads. To fix this, transfers now are run in separate processes.

Also added code to allow multiple transfers to run at once. Each transfer takes up a slot, with the number of free slots tracked by a QSemN. This allows the transfer starting thread to block until a slot frees up, and then run the transfer.

This needs to be extended to be aware of transfers initiated by remotes. The transfer watcher thread should detect those starting and stopping and update the QSemN accordingly. It would also be nice if transfers initiated by remotes would be delayed when there are no free slots for them ... but I have not thought of a good way to do that.

There's a bug somewhere in the new transfer code, when two transfers are queued close together, the second one is lost and doesn't happen. Would debug this, but I'm spent for the day.

Posted Wed Oct 10 15:36:01 2012

I made the MountWatcher only use dbus if it sees a client connected to dbus that it knows will send mount events, or if it can start up such a client via dbus. (Fancy!) Otherwise it falls back to polling. This should be enough to support users who manually mount things -- if they have gvfs installed, it'll be used to detect their manual mounts, even when a desktop is not running, and if they don't have gvfs, they get polling.

Also, I got the MountWatcher to work with KDE. Found a dbus event that's emitted when KDE mounts a drive, and this is also used. If anyone with some other desktop environment wants me to add support for it, and it uses dbus, it should be easy: Run dbus-monitor, plug in a drive, get it mounted, and send me a transcript.

Of course, it'd also be nice to support anything similar on OSX that can provide mount event notifications. Not a priority though, since the polling code will work.


Some OS X fixes today..

  • Jimmy pointed out that my getmntent code broke the build on OSX again. Sorry about that.. I keep thinking Unix portability nightmares are a 80's thing, not a 2010's thing. Anyway, adapted a lot of hackish C code to emulate getmntent on BSD systems, and it seems to work. (I actually think the BSD interface to this is saner than Linux's, but I'd rather have either one than both, sigh..)
  • Kqueue was blocking all the threads on OSX. This is fixed, and the assistant seems to be working on OSX again.

I put together a preliminary page thanking everyone who contributed to the git-annex Kickstarter. thanks The wall-o-names is scary crazy humbling.


Improved --debug mode for the assistant, now every thread says whenever it's doing anything interesting, and also there are timestamps.


Had been meaning to get on with syncing to drives when they're mounted, but got sidetracked with the above. Maybe tomorrow. I did think through it in some detail as I was waking up this morning, and think I have a pretty good handle on it.

Posted Wed Oct 10 15:36:01 2012

I've been investigating how to make git annex watch work on FreeBSD, and by extension, OSX.

One option is kqueue, which works on both operating systems, and allows very basic monitoring of file changes. There's also an OSX specific hfsevents interface.

Kqueue is far from optimal for git annex watch, because it provides even less information than inotify (which didn't really provide everything I needed, thus the lsof hack). Kqueue doesn't have events for files being closed, only an event when a file is created. So it will be difficult for git annex watch to know when a file is done being written to and can be annexed. git annex will probably need to run lsof periodically to check when recently added files are complete. (hsevents shares this limitation)

Kqueue also doesn't provide specific events when a file or directory is moved. Indeed, it doesn't provide specific events about what changed at all. All you get with kqueue is a generic "oh hey, the directory you're watching changed in some way", and it's up to you to scan it to work out how. So git annex will probably need to run git ls-tree --others to find changes in the directory tree. This could be expensive with large trees. (hsevents has per-file events on current versions of OSX)

Despite these warts, I want to try kqueue first, since it's more portable than hfsevents, and will surely be easier for me to develop support for, since I don't have direct access to OSX.

So I went to a handy Debian kFreeBSD porter box, and tried some kqueue stuff to get a feel for it. I got a python program that does basic directory monitoring with kqueue to work, so I know it's usable there.

Next step was getting kqueue working from Haskell. Should be easy, there's a Haskell library already. I spent a while trying to get it to work on Debian kFreeBSD, but ran into a problem that could be caused by the Debian kFreeBSD being different, or just a bug in the Haskell library. I didn't want to spend too long shaving this yak; I might install "real" FreeBSD on a spare laptop and try to get it working there instead.

But for now, I've dropped down to C instead, and have a simple C program that can monitor a directory with kqueue. Next I'll turn it into a simple library, which can easily be linked into my Haskell code. The Haskell code will pass it a set of open directory descriptors, and it'll return the one that it gets an event on. This is necessary because kqueue doesn't recurse into subdirectories on its own.

I've generally had good luck with this approach to adding stuff in Haskell; rather than writing a bit-banging and structure packing low level interface in Haskell, write it in C, with a simpler interface between C and Haskell.

Posted Wed Oct 10 15:36:01 2012

Worked today on two action items from my last blog post:

  • on-disk transfers in progress information files (read/write/enumerate)
  • locking for the files, so redundant transfer races can be detected, and failed transfers noticed

That's all done, and used by the get, copy, and move subcommands.

Also, I made git-annex status use that information to display any file transfers that are currently in progress:

joey@gnu:~/lib/sound/misc>git annex status
[...]
transfers in progress: 
    downloading Vic-303.mp3 from leech

(Webapp, here we come!)

However... Files being sent or received by git-annex-shell don't yet have this transfer info recorded. The problem is that to do so, git-annex-shell will need to be run with a --remote= parameter. But old versions will of course fail when run with such an unknown parameter.

This is a problem I last faced in December 2011 when adding the --uuid= parameter. That time I punted and required the remote git-annex-shell be updated to a new enough version to accept it. But as git-annex gets more widely used and packaged, that's becoming less an option. I need to find a real solution to this problem.

Posted Wed Oct 10 15:36:01 2012

Last night I got git annex watch to also handle deletion of files. This was not as tricky as feared; the key is using git rm --ignore-unmatch, which avoids most problematic situations (such as a just deleted file being added back before git is run).

Also fixed some races when git annex watch is doing its startup scan of the tree, which might be changed as it's being traversed. Now only one thread performs actions at a time, so inotify events are queued up during the scan, and dealt with once it completes. It's worth noting that inotify can only buffer so many events .. Which might have been a problem except for a very nice feature of Haskell's inotify interface: It has a thread that drains the limited inotify buffer and does its own buffering.


Right now, git annex watch is not as fast as it could be when doing something like adding a lot of files, or deleting a lot of files. For each file, it currently runs a git command that updates the index. I did some work toward coalescing these into one command (which git annex already does normally). It's not quite ready to be turned on yet, because of some races involving git add that become much worse if it's delayed by event coalescing.


And races were the theme of today. Spent most of the day really getting to grips with all the fun races that can occur between modification happening to files, and git annex watch. The inotify page now has a long list of known races, some benign, and several, all involving adding files, that are quite nasty.

I fixed one of those races this evening. The rest will probably involve moving away from using git add, which necessarily examines the file on disk, to directly shoving the symlink into git's index.

BTW, it turns out that dvcs-autosync has grappled with some of these same races: http://comments.gmane.org/gmane.comp.version-control.home-dir/665 I hope that git annex watch will be in a better place to deal with them, since it's only dealing with git, and with a restricted portion of it relevant to git-annex.

It's important that git annex watch be rock solid. It's the foundation of the git annex assistant. Users should not need to worry about races when using it. Most users won't know what race conditions are. If only I could be so lucky!

Posted Wed Oct 10 15:36:01 2012

Probably won't be doing any big coding on the git-annex assistant in the upcoming week, as I'll be traveling and/or slightly ill enough that I can't fully get into flow.


There was a new Yesod release this week, which required minor changes to make the webapp build with it. I managed to keep the old version of Yesod also supported, and plan to keep that working so it can be built with the version of Yesod available in, eg, Linux distributions. TBD how much pain that will involve going forward.


I'm mulling over how to support stopping/pausing transfers. The problem is that if the assistant is running a transfer in one thread, and the webapp is used to cancel it, killing that thread won't necessarily stop the transfer, because, at least in Haskell's thread model, killing a thread does not kill processes started by the thread (like rsync).

So one option is to have the transfer thread run a separate git-annex process, which will run the actual transfer. And killing that process will stop the transfer nicely. However, using a separate git-annex process means a little startup overhead for each file transferred (I don't know if it'd be enough to matter). Also, there's the problem that git-annex is sometimes not installed in PATH (wish I understood why cabal does that), which makes it kind of hard for it to run itself. (It can't simply fork, sadly. See past horrible pain with forking and threads.)

The other option is to change the API for git-annex remotes, so that their storeKey and retrieveKeyFile methods return a pid of the program that they run. When they do run a program.. not all remotes do. This seems like it'd make the code in the remotes hairier, and it is also asking for bugs, when a remote's implementation changes. Or I could go lower-level, and make every place in the utility libraries that forks a process record its pid in a per-thread MVar. Still seems to be asking for bugs.

Oh well, at least git-annex is already crash-safe, so once I figure out how to kill a transfer process, I can kill it safely. :)

Posted Wed Oct 10 15:36:01 2012

Unexpectedly managed a mostly productive day today.

Went ahead with making the assistant run separate git-annex processes for transfers. This will currently fail if git-annex is not installed in PATH. (Deferred dealing with that.)

To stop a transfer, the webapp needs to signal not just the git-annex process, but all its children. I'm using process groups for this, which is working, but I'm not extremely happy with.

Anyway, the webapp's UI can now be used for stopping transfers, and it wasn't far from there to also implementing pausing of transfers.

Pausing a transfer is actually the same as stopping it, except a special signal is sent to the transfer control thread, which keeps running, despite the git-annex process having been killed, waits for a special resume signal, and restarts the transfer. This way a paused transfer continues to occupy a transfer slot, which prevents other queued transfers from running. This seems to be the behavior that makes sense.

Still need to wire up the webapp's button for starting a transfer. For a paused transfer, that will just need to resume it. I have not decided what the button should do when used on a transfer that is queued but not running yet. Maybe it forces it to run even if all transfer slots are already in use? Maybe it stops one of the currently running transfers to free up a slot?

Posted Wed Oct 10 15:36:01 2012
Comments on this page are closed.