[main] [writings]mg|pages

Scripting webpages

I record here a workflow to fix some problems with the links in the website. It can be used as a tutorial in TeXmacs scripting. The goal is to batch-modify pages. Written May 13th, 2023.

You can run the scripts in this page. However, pay attention that some of them will modify your filesystem: towards the end we use the system procedure & friends to overwrite files in the src/ subdirectory. Be careful.

This document refers to the state of the mgubi.github.io repository checked out at commit

dad6b1358b44534c0f6b0d3eb2e721cf04f440c1.

The goal is to change certain hyperlinks to use a user-defined macro instead, this will make easier in the future to customize the rendering or the location of the linked files.

Batch-modify pages

We need to set up the base directory first.

Scheme]
(setenv "NOTES" (url->system 
           (url-append (url-head (current-buffer)) "..")))
Scheme] 
(getenv "NOTES")

"/Users/mgubi/Library/CloudStorage/Dropbox/Safe/webpages"

Scheme]

            

Here we read and write a file to check that all is ok.

Scheme]
(define t (tree-import 
        (url->system "$NOTES/src/main.tm") "texmacs"))
Scheme] 
(tree-export t  (url->system "$NOTES/test.tm")  "texmacs")

#f

Scheme] 
(system "diff $NOTES/test.tm $NOTES/src/main.tm")

0

All good (0 means success): we can read and write TeXmacs files. Let's give a look to at all hyperlinks in the file, for example.

Scheme] 
(select t '(:* hlink))

(<tree <hlink|google scholar|http://scholar.google.ca/citations?hl=en&user=D4PR4LYAAAAJ&view_op=list_works&pagesize=100>> <tree <hlink|arXiv|http://arxiv.org/a/gubinelli_m_1>> <tree <hlink|ORCID|http://orcid.org/0000-0002-4014-2949>> <tree <hlink|@maxgubi@twitter|https://twitter.com/maxgubi>> <tree <hlink|@maxgubi@mathstdon.xyz|https://mathstodon.xyz/@maxgubi>> <tree <hlink|linkedin|https://www.linkedin.com/in/massimiliano-gubinelli-39bb8467/>> <tree <hlink|my institutional page|https://www.maths.ox.ac.uk/people/massimiliano.gubinelli>> <tree <hlink|Mathematical Institute|https://www.maths.ox.ac.uk>> <tree <hlink|vita|./curriculum-vitae.tm>> <tree <hlink|research|research.tm>> <tree <hlink|teaching|teaching/teaching.tm>> <tree <hlink|programming|./programming.tm>> <tree <hlink|writings|writings.tm>> <tree <hlink|list|./list-articles.tm>> <tree <hlink|Atom|notes.atom>> <tree <hlink|template|template.tm>> <tree <hlink|<TeXmacs>|http://www.texmacs.org>> <tree <hlink|github|https://github.com/mgubi/webpages>>)

Scheme]

            

We want to replace all the hlink tags with notes-link tags, so that we will be able to customize them afterwards. select returns a list of tree which should remember their positions on the document tree, so we can try to just modify them in place, one after the other. Let's try with the first:

Scheme]
(define h (car (select t '(:* hlink))))
Scheme] 
(list h)

(<tree <hlink|google scholar|http://scholar.google.ca/citations?hl=en&user=D4PR4LYAAAAJ&view_op=list_works&pagesize=100>>)

Scheme]

            

I put in a list the tree otherwise TeXmacs will try to typeset it. Let's now change the label of this tree.

Scheme] 
(list (tree-assign-node! h 'notes-link))

(<tree <notes-link|google scholar|http://scholar.google.ca/citations?hl=en&user=D4PR4LYAAAAJ&view_op=list_works&pagesize=100>>)

Scheme] 
(select t '(:* notes-link))

(<tree <notes-link|google scholar|http://scholar.google.ca/citations?hl=en&user=D4PR4LYAAAAJ&view_op=list_works&pagesize=100>>)

Scheme]

            

It worked! Now the documents' first hlink has been changes in notes-link. I can now proceed and change all the others.

Scheme] 
(map (lambda (h) (tree-assign-node! h 'notes-link)) 
    (select t '(:* hlink)))

(<tree <notes-link|arXiv|http://arxiv.org/a/gubinelli_m_1>> <tree <notes-link|ORCID|http://orcid.org/0000-0002-4014-2949>> <tree <notes-link|@maxgubi@twitter|https://twitter.com/maxgubi>> <tree <notes-link|@maxgubi@mathstdon.xyz|https://mathstodon.xyz/@maxgubi>> <tree <notes-link|linkedin|https://www.linkedin.com/in/massimiliano-gubinelli-39bb8467/>> <tree <notes-link|my institutional page|https://www.maths.ox.ac.uk/people/massimiliano.gubinelli>> <tree <notes-link|Mathematical Institute|https://www.maths.ox.ac.uk>> <tree <notes-link|vita|./curriculum-vitae.tm>> <tree <notes-link|research|research.tm>> <tree <notes-link|teaching|teaching/teaching.tm>> <tree <notes-link|programming|./programming.tm>> <tree <notes-link|writings|writings.tm>> <tree <notes-link|list|./list-articles.tm>> <tree <notes-link|Atom|notes.atom>> <tree <notes-link|template|template.tm>> <tree <notes-link|<TeXmacs>|http://www.texmacs.org>> <tree <notes-link|github|https://github.com/mgubi/webpages>>)

Scheme]

            

And check that there are indeed no more hlink in the document

Scheme] 
(select t '(:* hlink))

()

Scheme]

            

Let's save the result in a new file.

Scheme] 
(tree-export t (url->system "$NOTES/test.tm")
             "texmacs")

#f

Scheme]

            

I would like to be more selective and change only the internal links, i.e. those which do not start with http:// or https://. So let's start again. tree-ref extracts the subtrees of a given tree.

Scheme]
(define t (tree-import (url->system "$NOTES/src/main.tm")
                       "texmacs"))
Scheme]
(define l (select t '(:* hlink)))
Scheme] 
(list (car l))

(<tree <hlink|google scholar|http://scholar.google.ca/citations?hl=en&user=D4PR4LYAAAAJ&view_op=list_works&pagesize=100>>)

Scheme] 
(list (tree-ref (car l) 0))

(<tree google scholar>)

Scheme] 
(list (tree-ref (car l) 1))

(<tree http://scholar.google.ca/citations?hl=en&user=D4PR4LYAAAAJ&view_op=list_works&pagesize=100>)

Scheme] 
(tree-atomic? (tree-ref (car l) 1))

#t

Scheme] 
(tree->string (tree-ref (car l) 1))

"http://scholar.google.ca/citations?hl=en&user=D4PR4LYAAAAJ&view_op=list_works&pagesize=100"

Scheme]

            

Atomic trees are strings: I need to check if it atomic and then check if it starts with http://.

Scheme] 
(string-starts? (tree->string (tree-ref (car l) 1))
                "http://")

#t

Scheme]

            

Which ideed it does. I can now filter all the similar trees and change only those.

Scheme]
(define l1 (filter (lambda (h) 
   (not (and (tree-atomic? (tree-ref h 1))
       (or  (string-starts? (tree->string (tree-ref h 1))
                           "http://")  
        (string-starts? (tree->string (tree-ref h 1))
                        "https://")))))
      l))
Scheme] 
l1

(<tree <hlink|vita|./curriculum-vitae.tm>> <tree <hlink|research|research.tm>> <tree <hlink|teaching|teaching/teaching.tm>> <tree <hlink|programming|./programming.tm>> <tree <hlink|writings|writings.tm>> <tree <hlink|list|./list-articles.tm>> <tree <hlink|Atom|notes.atom>> <tree <hlink|template|template.tm>>)

Scheme] 
(map (lambda (h) (tree-assign-node! h 'notes-link)) l1)

(<tree <notes-link|vita|./curriculum-vitae.tm>> <tree <notes-link|research|research.tm>> <tree <notes-link|teaching|teaching/teaching.tm>> <tree <notes-link|programming|./programming.tm>> <tree <notes-link|writings|writings.tm>> <tree <notes-link|list|./list-articles.tm>> <tree <notes-link|Atom|notes.atom>> <tree <notes-link|template|template.tm>>)

Scheme]

            

I would like also to normalize the filenames to avoid ./ since it is redundant. Note that now l1 has changed.

Scheme] 
l1

(<tree <notes-link|vita|./curriculum-vitae.tm>> <tree <notes-link|research|research.tm>> <tree <notes-link|teaching|teaching/teaching.tm>> <tree <notes-link|programming|./programming.tm>> <tree <notes-link|writings|writings.tm>> <tree <notes-link|list|./list-articles.tm>> <tree <notes-link|Atom|notes.atom>> <tree <notes-link|template|template.tm>>)

Scheme]

            

So I can iterate again and change the second child to drop ./ whenever needed. To do this we use tree-set! which replaces in place a given tree relative to a base tree.

Scheme] 
(map (lambda (h)
       (if (and (tree-atomic? (tree-ref h 1)) 
                (string-starts? (tree->string (tree-ref h 1))
                                "./"))
            (tree-set! h 1 (string-drop 
                            (tree->string (tree-ref h 1)) 2))))
      (select t '(:* notes-link)))

(<tree curriculum-vitae.tm> #<unspecified> #<unspecified> <tree programming.tm> #<unspecified> <tree list-articles.tm> #<unspecified> #<unspecified>)

Scheme] 
l1

(<tree <notes-link|vita|curriculum-vitae.tm>> <tree <notes-link|research|research.tm>> <tree <notes-link|teaching|teaching/teaching.tm>> <tree <notes-link|programming|programming.tm>> <tree <notes-link|writings|writings.tm>> <tree <notes-link|list|list-articles.tm>> <tree <notes-link|Atom|notes.atom>> <tree <notes-link|template|template.tm>>)

Scheme]

            

Let's check that this actually modified the document.

Scheme] 
(select t '(:* notes-link))

(<tree <notes-link|vita|curriculum-vitae.tm>> <tree <notes-link|research|research.tm>> <tree <notes-link|teaching|teaching/teaching.tm>> <tree <notes-link|programming|programming.tm>> <tree <notes-link|writings|writings.tm>> <tree <notes-link|list|list-articles.tm>> <tree <notes-link|Atom|notes.atom>> <tree <notes-link|template|template.tm>>)

Scheme]

            

Yes, it did. We wrap now all the process in a function which operate on a given tree.

Scheme]
(define (handle-links t) (let* 
  ((l (select t '(:* hlink)))
  (l1 (filter (lambda (h) 
   (not (and (tree-atomic? (tree-ref h 1))
       (or  (string-starts? (tree->string (tree-ref h 1))
                           "http://")  
        (string-starts? (tree->string (tree-ref h 1))
                        "https://")))))
      l)))
  (map (lambda (h)
       (tree-assign-node! h 'notes-link)
       (if (and (tree-atomic? (tree-ref h 1)) 
                (string-starts? (tree->string (tree-ref h 1))
                                "./"))
           (tree-set! h 1 (string-drop 
                            (tree->string (tree-ref h 1)) 2))))
      l1)
  ))

And check that it actually works as expected.

Scheme]
(define t (tree-import (url->system "$NOTES/src/main.tm")
                       "texmacs"))
Scheme] 
(handle-links t)

(<tree curriculum-vitae.tm> #<unspecified> #<unspecified> <tree programming.tm> #<unspecified> <tree list-articles.tm> #<unspecified> #<unspecified>)

Scheme] 
(select t '(:* (:or hlink notes-link)))

(<tree <hlink|google scholar|http://scholar.google.ca/citations?hl=en&user=D4PR4LYAAAAJ&view_op=list_works&pagesize=100>> <tree <hlink|arXiv|http://arxiv.org/a/gubinelli_m_1>> <tree <hlink|ORCID|http://orcid.org/0000-0002-4014-2949>> <tree <hlink|@maxgubi@twitter|https://twitter.com/maxgubi>> <tree <hlink|@maxgubi@mathstdon.xyz|https://mathstodon.xyz/@maxgubi>> <tree <hlink|linkedin|https://www.linkedin.com/in/massimiliano-gubinelli-39bb8467/>> <tree <hlink|my institutional page|https://www.maths.ox.ac.uk/people/massimiliano.gubinelli>> <tree <hlink|Mathematical Institute|https://www.maths.ox.ac.uk>> <tree <notes-link|vita|curriculum-vitae.tm>> <tree <notes-link|research|research.tm>> <tree <notes-link|teaching|teaching/teaching.tm>> <tree <notes-link|programming|programming.tm>> <tree <notes-link|writings|writings.tm>> <tree <notes-link|list|list-articles.tm>> <tree <notes-link|Atom|notes.atom>> <tree <notes-link|template|template.tm>> <tree <hlink|<TeXmacs>|http://www.texmacs.org>> <tree <hlink|github|https://github.com/mgubi/webpages>>)

Scheme]

            

Which it does.

Improving references to stored files

We want to perform also another batch change, and replace all the links into the store with a new tag notes-store.

Scheme]
(define t (tree-import (url->system "$NOTES/src/people.tm")
                       "texmacs"))
Scheme]
(define l (filter (lambda (h) 
              (string-contains?  (tree->string (tree-ref h 1))
                                 "store/")) 
                (select t '(:* hlink)))))
Scheme] 
l

(<tree <hlink|pdf|../store/master-thesis-song.pdf>> <tree <hlink|slides|../store/master-thesis-seminar-song.pdf>> <tree <hlink|pdf|../store/master-thesis-meyer.pdf>> <tree <hlink|slides|../store/master-thesis-seminar-meyer.pdf>> <tree <hlink|pdf|../store/master-thesis-noeller.pdf>> <tree <hlink|pdf|../store/master-thesis-zografos.pdf>> <tree <hlink|pdf|../store/master-thesis-martini.pdf>> <tree <hlink|pdf|../store/master-thesis-orenday.pdf>> <tree <hlink|pdf|../store/master-thesis-barashkov.pdf>>)

We check that an hyperlink has store/ as sublink and we preform the substitions in this case.

Scheme] 
(string-contains "../store/master-thesis-barashkov.pdf" "store/")

3

Scheme] 
(string-drop "../store/master-thesis-barashkov.pdf" 9)

"master-thesis-barashkov.pdf"

Scheme]
(for-each (lambda (h) 
     (tree-assign-node! h 'notes-store) 
     (let* ((s (tree->string (tree-ref h 1)))
            (s1 (string-drop s 
                    (+ 6 (string-contains s "store/")))))
       (tree-set! h 1 s1))) 
          l)
Scheme] 
l

(<tree <notes-store|pdf|master-thesis-song.pdf>> <tree <notes-store|slides|master-thesis-seminar-song.pdf>> <tree <notes-store|pdf|master-thesis-meyer.pdf>> <tree <notes-store|slides|master-thesis-seminar-meyer.pdf>> <tree <notes-store|pdf|master-thesis-noeller.pdf>> <tree <notes-store|pdf|master-thesis-zografos.pdf>> <tree <notes-store|pdf|master-thesis-martini.pdf>> <tree <notes-store|pdf|master-thesis-orenday.pdf>> <tree <notes-store|pdf|master-thesis-barashkov.pdf>>)

Ok. We wrap all the process in an handy function. (which has to be applied after handle-links)

Scheme]
(define (handle-notes-store t)
  (with l (filter (lambda (h) (string-contains?  
                  (tree->string (tree-ref h 1)) "store/")) 
                (select t '(:* notes-link)))
(for-each (lambda (h) 
     (tree-assign-node! h 'notes-store) 
     (let* ((s (tree->string (tree-ref h 1)))
            (s1 (string-drop s 
                  (+ 6 (string-contains s "store/")))))
       (tree-set! h 1 s1)))
    l)))
Scheme] 
 (handle-links t)

(#<unspecified> #<unspecified> #<unspecified> #<unspecified> #<unspecified> #<unspecified> #<unspecified> #<unspecified> #<unspecified>)

Scheme]
(handle-notes-store t)
Scheme] 
(select t '(:* notes-store))

(<tree <notes-store|pdf|master-thesis-song.pdf>> <tree <notes-store|slides|master-thesis-seminar-song.pdf>> <tree <notes-store|pdf|master-thesis-meyer.pdf>> <tree <notes-store|slides|master-thesis-seminar-meyer.pdf>> <tree <notes-store|pdf|master-thesis-noeller.pdf>> <tree <notes-store|pdf|master-thesis-zografos.pdf>> <tree <notes-store|pdf|master-thesis-martini.pdf>> <tree <notes-store|pdf|master-thesis-orenday.pdf>> <tree <notes-store|pdf|master-thesis-barashkov.pdf>>)

Scheme]

            

All good.

Putting all together

Now we aim to batch-process all the files in a given directory. We gather all the files in the $NOTES/src directory, removing the prefix to make easier later manipulations.

Scheme]
(define u1 (url-append (url-append "$NOTES/src" 
                                   (url-any))  
                      "*.tm"))
Scheme] 
u1

<url /Users/mgubi/Library/CloudStorage/Dropbox/Safe/webpages/src/{**}/*.tm>

Scheme]
(define files (url->list (url-delta "$NOTES/src/dummy" 
                (url-expand (url-complete u1 "fr")))))
Scheme] 
(list-take files 5)

(<url curriculum-vitae.tm> <url events.tm> <url list-articles.tm> <url main.tm> <url old-publications.tm>)

Scheme] 
(url-directory? (url-append (url-append "$NOTES/src" 
                                    (car files)) 
                         (url-parent)))

#t

Scheme] 
(url-head (car files))

<url .>

Scheme]

            

Now we want to create a similar arborescence as the one in $NOTES/src.

Scheme]
(define (make-dir-tree url)
  (when (!= url (system->url "$NOTES/src2")) 
      (make-dir-tree (url-expand 
                      (url-append url (url-parent))))
      (when (not (url-exists? url))     
        (system-mkdir url)
        (system-1 "chmod a+x" url))))
Scheme]
(define dirs (ahash-set->list (list->ahash-set 
          (map (lambda (f) (url-head 
                          (url-append "$NOTES/src2" f)))
               files))))
Scheme]
(for-each make-dir-tree dirs)
Scheme]

            

This created the right arborescence. We are ready now to convert each file in files.

Scheme]
(define (proc-file url)
  (define t (tree-import (url->system 
                          (url-append "$NOTES/src" url))  
                         "texmacs"))
  (handle-links t)
  (handle-notes-store t)
  (tree-export t (url->system 
                  (url-append "$NOTES/src2" url)) 
               "texmacs"))
Scheme] 
(proc-file "main.tm")

#f

Scheme]

            

After these checks, we are ready to process all the files at once.

Scheme] 
(map proc-file files)

(#f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f)

Scheme]

            

Ok. After checking that there are no problems, we can copy the modified files from src2 to src.

Scheme]
(for-each (lambda (url) 
      (system-move (url-append "$NOTES/src2" url) 
                   (url-append "$NOTES/src" url))) 
          files)
Scheme]

            

And we are done.