Idea: Git as basis for future CTAN and TeX Live. (Discuss here or at tomorrow's TeX Hour)

Patrick std300 at gmail.com
Sun Jun 27 19:58:27 CEST 2021


(Sorry for the double post, Jonathan)

I used to mirror CTAN with a git repository (a commit of the current
status every day). It grew so big, that was completely unmaintainable.
Git was not suitable for that. I have not tried git large file
storage, but I doubt that it would have helped me. My goal was to
create a real archive, which CTAN, despite its name, is not.

That said, I think it would not take much change to CTAN to make it
more suitable for distributing as a Git repository.

Patrick

Am Mi., 23. Juni 2021 um 18:58 Uhr schrieb Jonathan Fine <jfine2358 at gmail.com>:
>
> Hi
>
> As well as being a version control system, Git is a distributed peer-to-peer content addressable store. It's also efficient in its use of network bandwidth and mass storage. And it uses multiple cores when possible, so it's also quick. And it is, of course, widely used.
>
> All this makes git a good foundation for rethinking CTAN and TeX Live. This post explores this idea. We focus on git's use of PACKFILES to do peer-to-peer file sharing.
>
> When you clone a repository, the repository being cloned creates a single git pack file (and associated index file perhaps), which is then sent to the newly created local repository. From this, if required, the working files are created.
>
> If you do a pull from a source, the same process takes place, except that the two repositories first do some negotiation to determine what should be sent. And then as before a pack file is sent. And a push is similar. (Actually, in both cases, it might be several pack files.) Rsync, used by CTAN, also does peer-to-peer negotiation.
>
> Here's an example a git pull
>
> $ git clone git at github.com:jgm/pandoc.git
> Cloning into 'pandoc'...
> [snip]
>
> $ ls -l pandoc/.git/objects/pack/
> total 53480
> -r--r--r-- 1 jfine jfine  2.8M Jun 23 17:12 pack-53640....idx
> -r--r--r-- 1 jfine jfine 50M Jun 23 17:12 pack-53640....pack
>
> And now I've got every version of every file in the history of pandoc (up to the commit I pulled). That's not bad for 50M. (The index can be computed from the pack. It speeds disc access.)
>
> For GitLab the size limit is 10GB per repository. For GitHub the size limit is about 5GB. Norbert Preining's git-svn mirror of TeX Live is about 40GB.
>
> https://about.gitlab.com/blog/2015/04/08/gitlab-dot-com-storage-limit-raised-to-10gb-per-repo/
> https://github.community/t/working-with-large-files-and-repositories/10203
> https://texlive.info/
>
> Let me end with a question. It's related to hosting TeX Live on GitHub and GitLab.
>
> First, consider all files in any version of TeX Live that are used by any subscriber to this list as inputs to TeX or any of its associated programs. (This definition is crafted to exclude documentation files. And files not in TeX Live. It's the files in TeX Live that TeX or whatever inputs when typesetting.)
>
> Now for the question. Put all these files in a git pack file. How big will that pack file be? Perhaps powers of 2 is the way to ask this. In other words, at most 250M? At most 500M? At most 1G? At most 2G? At most 4G? At most 8G? At most 16G? At most 32G? At most 64G? [Stop here because Norbert's git-svn mirror provides 40G a bound.]
>
> If we're at most 5GB then we can use both GitHub and GitLab to host these files. And the TeX Collection / TeX Live could store this material as git pack files. This would make the DVD a https://en.wikipedia.org/wiki/Sneakernet for some TeX-related git repositories.
>
> Still here? Well done. I'll be discussing this, read-only file systems, immutable OSes and related methods at tomorrow evening's TeX Hour.
>
> When and where. Thursday 17 June, 6.30 to 7.30pm UK time. The UK time now is at https://time.is/UK. The zoom details are
> https://us02web.zoom.us/j/78551255396?pwd=cHdJN0pTTXRlRCtSd1lCTHpuWmNIUT09
> Meeting ID: 785 5125 5396
> Passcode: knuth
>
> For the keen: READ-ONLY FILE SYSTEMS
> https://en.wikipedia.org/wiki/Zero_Install
> https://en.wikipedia.org/wiki/Snap_(package_manager)
> https://archive.fosdem.org/2017/schedule/event/desktops_bundling_kde/
> https://en.wikipedia.org/wiki/Flatpak
>
> For the very keen: IMMUTABLE OSes
> https://www.theregister.com/2021/06/16/systemd_249_release_candidate/
> https://www.theregister.com/2021/04/01/systemd_248/
> https://www.theregister.com/2021/02/18/kinoite_immutable_fedora/
>
> Finally, video from last week's TeX Hour is available at
> https://www.youtube.com/playlist?list=PLw1FZfIX1w7hwBDqZoii3eOtd-RMivznf
>
> --
> Jonathan
>
>
>
>
>
>
>
>



More information about the tex-live mailing list.