Scripting webpages
I record here a workflow to fix some problems with the links in the
website. It can be used as a tutorial in TeXmacs scripting. The goal is
to batch-modify pages. Written May 13th, 2023.
You can run the scripts in this page. However, pay attention that
some of them will modify your filesystem: towards the end we
use the system procedure & friends to overwrite
files in the src/ subdirectory. Be careful.
|
This document refers to the state of the mgubi.github.io
repository checked out at commit
dad6b1358b44534c0f6b0d3eb2e721cf04f440c1
.
The goal is to change certain hyperlinks to use a user-defined macro
instead, this will make easier in the future to customize the rendering
or the location of the linked files.
Batch-modify pages
We need to set up the base directory first.
Scheme] |
(setenv "NOTES" (url->system
(url-append (url-head (current-buffer)) "..")))
|
"/Users/mgubi/Library/CloudStorage/Dropbox/Safe/webpages"
Here we read and write a file to check that all is ok.
Scheme] |
(define t (tree-import
(url->system "$NOTES/src/main.tm") "texmacs"))
|
Scheme] |
(tree-export t (url->system "$NOTES/test.tm") "texmacs")
|
Scheme] |
(system "diff $NOTES/test.tm $NOTES/src/main.tm")
|
All good (0
means success): we can read and write TeXmacs
files. Let's give a look to at all hyperlinks in the file, for example.
(<tree <hlink|google
scholar|http://scholar.google.ca/citations?hl=en&user=D4PR4LYAAAAJ&view_op=list_works&pagesize=100>>
<tree
<hlink|arXiv|http://arxiv.org/a/gubinelli_m_1>>
<tree
<hlink|ORCID|http://orcid.org/0000-0002-4014-2949>>
<tree
<hlink|@maxgubi@twitter|https://twitter.com/maxgubi>>
<tree
<hlink|@maxgubi@mathstdon.xyz|https://mathstodon.xyz/@maxgubi>>
<tree
<hlink|linkedin|https://www.linkedin.com/in/massimiliano-gubinelli-39bb8467/>>
<tree <hlink|my institutional
page|https://www.maths.ox.ac.uk/people/massimiliano.gubinelli>>
<tree <hlink|Mathematical
Institute|https://www.maths.ox.ac.uk>> <tree
<hlink|vita|./curriculum-vitae.tm>> <tree
<hlink|research|research.tm>> <tree
<hlink|teaching|teaching/teaching.tm>> <tree
<hlink|programming|./programming.tm>> <tree
<hlink|writings|writings.tm>> <tree
<hlink|list|./list-articles.tm>> <tree
<hlink|Atom|notes.atom>> <tree
<hlink|template|template.tm>> <tree
<hlink|<TeXmacs>|http://www.texmacs.org>>
<tree
<hlink|github|https://github.com/mgubi/webpages>>)
We want to replace all the hlink
tags with
notes-link
tags, so that we will be able to customize them
afterwards. select
returns a list of tree which should
remember their positions on the document tree, so we can try to just
modify them in place, one after the other. Let's try with the first:
Scheme] |
(define h (car (select t '(:* hlink))))
|
(<tree <hlink|google
scholar|http://scholar.google.ca/citations?hl=en&user=D4PR4LYAAAAJ&view_op=list_works&pagesize=100>>)
I put in a list the tree otherwise TeXmacs will try to typeset it. Let's
now change the label of this tree.
Scheme] |
(list (tree-assign-node! h 'notes-link))
|
(<tree <notes-link|google
scholar|http://scholar.google.ca/citations?hl=en&user=D4PR4LYAAAAJ&view_op=list_works&pagesize=100>>)
Scheme] |
(select t '(:* notes-link))
|
(<tree <notes-link|google
scholar|http://scholar.google.ca/citations?hl=en&user=D4PR4LYAAAAJ&view_op=list_works&pagesize=100>>)
It worked! Now the documents' first hlink
has been changes
in notes-link
. I can now proceed and change all the others.
Scheme] |
(map (lambda (h) (tree-assign-node! h 'notes-link))
(select t '(:* hlink)))
|
(<tree
<notes-link|arXiv|http://arxiv.org/a/gubinelli_m_1>>
<tree
<notes-link|ORCID|http://orcid.org/0000-0002-4014-2949>>
<tree
<notes-link|@maxgubi@twitter|https://twitter.com/maxgubi>>
<tree
<notes-link|@maxgubi@mathstdon.xyz|https://mathstodon.xyz/@maxgubi>>
<tree
<notes-link|linkedin|https://www.linkedin.com/in/massimiliano-gubinelli-39bb8467/>>
<tree <notes-link|my institutional
page|https://www.maths.ox.ac.uk/people/massimiliano.gubinelli>>
<tree <notes-link|Mathematical
Institute|https://www.maths.ox.ac.uk>> <tree
<notes-link|vita|./curriculum-vitae.tm>> <tree
<notes-link|research|research.tm>> <tree
<notes-link|teaching|teaching/teaching.tm>> <tree
<notes-link|programming|./programming.tm>> <tree
<notes-link|writings|writings.tm>> <tree
<notes-link|list|./list-articles.tm>> <tree
<notes-link|Atom|notes.atom>> <tree
<notes-link|template|template.tm>> <tree
<notes-link|<TeXmacs>|http://www.texmacs.org>>
<tree
<notes-link|github|https://github.com/mgubi/webpages>>)
And check that there are indeed no more hlink
in the
document
Let's save the result in a new file.
Scheme] |
(tree-export t (url->system "$NOTES/test.tm")
"texmacs")
|
I would like to be more selective and change only the internal links,
i.e. those which do not start with http://
or
https://
. So let's start again. tree-ref
extracts the subtrees of a given tree.
Scheme] |
(define t (tree-import (url->system "$NOTES/src/main.tm")
"texmacs"))
|
Scheme] |
(define l (select t '(:* hlink)))
|
(<tree <hlink|google
scholar|http://scholar.google.ca/citations?hl=en&user=D4PR4LYAAAAJ&view_op=list_works&pagesize=100>>)
Scheme] |
(list (tree-ref (car l) 0))
|
Scheme] |
(list (tree-ref (car l) 1))
|
(<tree
http://scholar.google.ca/citations?hl=en&user=D4PR4LYAAAAJ&view_op=list_works&pagesize=100>)
Scheme] |
(tree-atomic? (tree-ref (car l) 1))
|
Scheme] |
(tree->string (tree-ref (car l) 1))
|
"http://scholar.google.ca/citations?hl=en&user=D4PR4LYAAAAJ&view_op=list_works&pagesize=100"
Atomic trees are strings: I need to check if it atomic and then check if
it starts with http://
.
Scheme] |
(string-starts? (tree->string (tree-ref (car l) 1))
"http://")
|
Which ideed it does. I can now filter all the similar trees and change
only those.
Scheme] |
(define l1 (filter (lambda (h)
(not (and (tree-atomic? (tree-ref h 1))
(or (string-starts? (tree->string (tree-ref h 1))
"http://")
(string-starts? (tree->string (tree-ref h 1))
"https://")))))
l))
|
(<tree <hlink|vita|./curriculum-vitae.tm>> <tree
<hlink|research|research.tm>> <tree
<hlink|teaching|teaching/teaching.tm>> <tree
<hlink|programming|./programming.tm>> <tree
<hlink|writings|writings.tm>> <tree
<hlink|list|./list-articles.tm>> <tree
<hlink|Atom|notes.atom>> <tree
<hlink|template|template.tm>>)
Scheme] |
(map (lambda (h) (tree-assign-node! h 'notes-link)) l1)
|
(<tree <notes-link|vita|./curriculum-vitae.tm>>
<tree <notes-link|research|research.tm>> <tree
<notes-link|teaching|teaching/teaching.tm>> <tree
<notes-link|programming|./programming.tm>> <tree
<notes-link|writings|writings.tm>> <tree
<notes-link|list|./list-articles.tm>> <tree
<notes-link|Atom|notes.atom>> <tree
<notes-link|template|template.tm>>)
I would like also to normalize the filenames to avoid ./
since it is redundant. Note that now l1
has changed.
(<tree <notes-link|vita|./curriculum-vitae.tm>>
<tree <notes-link|research|research.tm>> <tree
<notes-link|teaching|teaching/teaching.tm>> <tree
<notes-link|programming|./programming.tm>> <tree
<notes-link|writings|writings.tm>> <tree
<notes-link|list|./list-articles.tm>> <tree
<notes-link|Atom|notes.atom>> <tree
<notes-link|template|template.tm>>)
So I can iterate again and change the second child to drop
./
whenever needed. To do this we use
tree-set!
which replaces in place a given tree relative to
a base tree.
Scheme] |
(map (lambda (h)
(if (and (tree-atomic? (tree-ref h 1))
(string-starts? (tree->string (tree-ref h 1))
"./"))
(tree-set! h 1 (string-drop
(tree->string (tree-ref h 1)) 2))))
(select t '(:* notes-link)))
|
(<tree curriculum-vitae.tm> #<unspecified>
#<unspecified> <tree programming.tm>
#<unspecified> <tree list-articles.tm>
#<unspecified> #<unspecified>)
(<tree <notes-link|vita|curriculum-vitae.tm>>
<tree <notes-link|research|research.tm>> <tree
<notes-link|teaching|teaching/teaching.tm>> <tree
<notes-link|programming|programming.tm>> <tree
<notes-link|writings|writings.tm>> <tree
<notes-link|list|list-articles.tm>> <tree
<notes-link|Atom|notes.atom>> <tree
<notes-link|template|template.tm>>)
Let's check that this actually modified the document.
Scheme] |
(select t '(:* notes-link))
|
(<tree <notes-link|vita|curriculum-vitae.tm>>
<tree <notes-link|research|research.tm>> <tree
<notes-link|teaching|teaching/teaching.tm>> <tree
<notes-link|programming|programming.tm>> <tree
<notes-link|writings|writings.tm>> <tree
<notes-link|list|list-articles.tm>> <tree
<notes-link|Atom|notes.atom>> <tree
<notes-link|template|template.tm>>)
Yes, it did. We wrap now all the process in a function which operate on
a given tree.
Scheme] |
(define (handle-links t) (let*
((l (select t '(:* hlink)))
(l1 (filter (lambda (h)
(not (and (tree-atomic? (tree-ref h 1))
(or (string-starts? (tree->string (tree-ref h 1))
"http://")
(string-starts? (tree->string (tree-ref h 1))
"https://")))))
l)))
(map (lambda (h)
(tree-assign-node! h 'notes-link)
(if (and (tree-atomic? (tree-ref h 1))
(string-starts? (tree->string (tree-ref h 1))
"./"))
(tree-set! h 1 (string-drop
(tree->string (tree-ref h 1)) 2))))
l1)
))
|
And check that it actually works as expected.
Scheme] |
(define t (tree-import (url->system "$NOTES/src/main.tm")
"texmacs"))
|
(<tree curriculum-vitae.tm> #<unspecified>
#<unspecified> <tree programming.tm>
#<unspecified> <tree list-articles.tm>
#<unspecified> #<unspecified>)
Scheme] |
(select t '(:* (:or hlink notes-link)))
|
(<tree <hlink|google
scholar|http://scholar.google.ca/citations?hl=en&user=D4PR4LYAAAAJ&view_op=list_works&pagesize=100>>
<tree
<hlink|arXiv|http://arxiv.org/a/gubinelli_m_1>>
<tree
<hlink|ORCID|http://orcid.org/0000-0002-4014-2949>>
<tree
<hlink|@maxgubi@twitter|https://twitter.com/maxgubi>>
<tree
<hlink|@maxgubi@mathstdon.xyz|https://mathstodon.xyz/@maxgubi>>
<tree
<hlink|linkedin|https://www.linkedin.com/in/massimiliano-gubinelli-39bb8467/>>
<tree <hlink|my institutional
page|https://www.maths.ox.ac.uk/people/massimiliano.gubinelli>>
<tree <hlink|Mathematical
Institute|https://www.maths.ox.ac.uk>> <tree
<notes-link|vita|curriculum-vitae.tm>> <tree
<notes-link|research|research.tm>> <tree
<notes-link|teaching|teaching/teaching.tm>> <tree
<notes-link|programming|programming.tm>> <tree
<notes-link|writings|writings.tm>> <tree
<notes-link|list|list-articles.tm>> <tree
<notes-link|Atom|notes.atom>> <tree
<notes-link|template|template.tm>> <tree
<hlink|<TeXmacs>|http://www.texmacs.org>>
<tree
<hlink|github|https://github.com/mgubi/webpages>>)
Which it does.
Improving references to stored files
We want to perform also another batch change, and replace all the links
into the store with a new tag notes-store.
Scheme] |
(define t (tree-import (url->system "$NOTES/src/people.tm")
"texmacs"))
|
Scheme] |
(define l (filter (lambda (h)
(string-contains? (tree->string (tree-ref h 1))
"store/"))
(select t '(:* hlink)))))
|
(<tree <hlink|pdf|../store/master-thesis-song.pdf>>
<tree
<hlink|slides|../store/master-thesis-seminar-song.pdf>>
<tree <hlink|pdf|../store/master-thesis-meyer.pdf>>
<tree
<hlink|slides|../store/master-thesis-seminar-meyer.pdf>>
<tree
<hlink|pdf|../store/master-thesis-noeller.pdf>>
<tree
<hlink|pdf|../store/master-thesis-zografos.pdf>>
<tree
<hlink|pdf|../store/master-thesis-martini.pdf>>
<tree
<hlink|pdf|../store/master-thesis-orenday.pdf>>
<tree
<hlink|pdf|../store/master-thesis-barashkov.pdf>>)
We check that an hyperlink has store/ as sublink and we
preform the substitions in this case.
Scheme] |
(string-contains "../store/master-thesis-barashkov.pdf" "store/")
|
Scheme] |
(string-drop "../store/master-thesis-barashkov.pdf" 9)
|
"master-thesis-barashkov.pdf"
Scheme] |
(for-each (lambda (h)
(tree-assign-node! h 'notes-store)
(let* ((s (tree->string (tree-ref h 1)))
(s1 (string-drop s
(+ 6 (string-contains s "store/")))))
(tree-set! h 1 s1)))
l)
|
(<tree <notes-store|pdf|master-thesis-song.pdf>>
<tree
<notes-store|slides|master-thesis-seminar-song.pdf>>
<tree <notes-store|pdf|master-thesis-meyer.pdf>>
<tree
<notes-store|slides|master-thesis-seminar-meyer.pdf>>
<tree <notes-store|pdf|master-thesis-noeller.pdf>>
<tree <notes-store|pdf|master-thesis-zografos.pdf>>
<tree <notes-store|pdf|master-thesis-martini.pdf>>
<tree <notes-store|pdf|master-thesis-orenday.pdf>>
<tree
<notes-store|pdf|master-thesis-barashkov.pdf>>)
Ok. We wrap all the process in an handy function. (which has to be
applied after handle-links)
Scheme] |
(define (handle-notes-store t)
(with l (filter (lambda (h) (string-contains?
(tree->string (tree-ref h 1)) "store/"))
(select t '(:* notes-link)))
(for-each (lambda (h)
(tree-assign-node! h 'notes-store)
(let* ((s (tree->string (tree-ref h 1)))
(s1 (string-drop s
(+ 6 (string-contains s "store/")))))
(tree-set! h 1 s1)))
l)))
|
(#<unspecified> #<unspecified> #<unspecified>
#<unspecified> #<unspecified> #<unspecified>
#<unspecified> #<unspecified> #<unspecified>)
Scheme] |
(select t '(:* notes-store))
|
(<tree <notes-store|pdf|master-thesis-song.pdf>>
<tree
<notes-store|slides|master-thesis-seminar-song.pdf>>
<tree <notes-store|pdf|master-thesis-meyer.pdf>>
<tree
<notes-store|slides|master-thesis-seminar-meyer.pdf>>
<tree <notes-store|pdf|master-thesis-noeller.pdf>>
<tree <notes-store|pdf|master-thesis-zografos.pdf>>
<tree <notes-store|pdf|master-thesis-martini.pdf>>
<tree <notes-store|pdf|master-thesis-orenday.pdf>>
<tree
<notes-store|pdf|master-thesis-barashkov.pdf>>)
All good.
Putting all together
Now we aim to batch-process all the files in a given directory. We
gather all the files in the $NOTES/src directory,
removing the prefix to make easier later manipulations.
Scheme] |
(define u1 (url-append (url-append "$NOTES/src"
(url-any))
"*.tm"))
|
<url
/Users/mgubi/Library/CloudStorage/Dropbox/Safe/webpages/src/{**}/*.tm>
Scheme] |
(define files (url->list (url-delta "$NOTES/src/dummy"
(url-expand (url-complete u1 "fr")))))
|
(<url curriculum-vitae.tm> <url events.tm> <url
list-articles.tm> <url main.tm> <url
old-publications.tm>)
Scheme] |
(url-directory? (url-append (url-append "$NOTES/src"
(car files))
(url-parent)))
|
Now we want to create a similar arborescence as the one in $NOTES/src.
Scheme] |
(define (make-dir-tree url)
(when (!= url (system->url "$NOTES/src2"))
(make-dir-tree (url-expand
(url-append url (url-parent))))
(when (not (url-exists? url))
(system-mkdir url)
(system-1 "chmod a+x" url))))
|
Scheme] |
(define dirs (ahash-set->list (list->ahash-set
(map (lambda (f) (url-head
(url-append "$NOTES/src2" f)))
files))))
|
Scheme] |
(for-each make-dir-tree dirs)
|
This created the right arborescence. We are ready now to convert each
file in files.
Scheme] |
(define (proc-file url)
(define t (tree-import (url->system
(url-append "$NOTES/src" url))
"texmacs"))
(handle-links t)
(handle-notes-store t)
(tree-export t (url->system
(url-append "$NOTES/src2" url))
"texmacs"))
|
After these checks, we are ready to process all the files at once.
(#f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f
#f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f
#f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f
#f)
Ok. After checking that there are no problems, we can copy the modified
files from src2 to src.
Scheme] |
(for-each (lambda (url)
(system-move (url-append "$NOTES/src2" url)
(url-append "$NOTES/src" url)))
files)
|
And we are done.