Quick tip - Feed nci-file-expiry output back into itself#

Quite often, you may need to recover many files from quarantine on Gadi that match a particular pattern. Unfortunately, this is not simple as the output of nci-file-expiry list-quarantine looks like this:

8a17868a-25ec-40bf-a41d-264eb3505da8  2023-03-17 00:22:38  v45     528.0K  /scratch/v45/dr4292/tmp/tmpsuh6nuuu
994065d8-1e09-488f-aa1a-90ca1457609a  2023-03-17 00:22:38  v45       4.0K  /scratch/v45/dr4292/tmp/tmpfg8_ecs4
2fb7c18c-d415-47d5-a127-690a6a02d929  2023-03-17 00:22:38  v45       4.0K  /scratch/v45/dr4292/tmp/tmpgqno3ul4
ca649efd-ecbb-4225-b5cd-0432d916f280  2023-03-17 00:22:38  v45       4.0K  /scratch/v45/dr4292/tmp/tmp2qa21xk1
856bf663-8a4c-44bd-a1cf-8aeed3bbf639  2023-03-17 00:22:38  v45     512.0K  /scratch/v45/dr4292/tmp/tmp7mt1e23p
...

and nci-file-expiry recover expects this:

$ nci-file-expiry recover 856bf663-8a4c-44bd-a1cf-8aeed3bbf639 /scratch/v45/dr4292/tmp/tmp7mt1e23p

and nci-file-expiry batch-recover expects a file.

Fortunately, with a bit of bash trickery, it is possible to feed the output of one command to another as if it were a file. In this case, we can feed nci-file-expiry list-quarantine output back in to nci-file-expiry batch-recover. Let’s imagine we’re looking for files that match the pattern *.ice_daily.nc. Here is the command that does this:

$ nci-file-expiry batch-recover <( nci-file-expiry list-quarantined | grep .ice_daily.nc | while read uuid a b c d path; do echo $uuid $path; done )

It’s fairly complicated, so let’s break it down, starting with the commands in brackets:

nci-file-expiry list-quarantined | grep .ice_daily.nc | while read uuid a b c d path; do echo $uuid $path; done

The piping output to grep part is pretty standard, but what is less common is piping the results into a loop afterwards. bash considers the entire loop construct as a single command, so you can pipe command output or redirect files into one as you would any other command. The loop itself:

while read uuid a b c d path; do echo $uuid $path; done

list-quarantined output has 6 columns, but batch-recover is expecting a file with 2 columns, which correspond to the first and last columns of the list-quarantined output. This loop reads in the grep’d list-quarantined output line-by-line, saves each column into a different variable, and echo’s the ones we need. The rest are discarded. So running those commands connected with pipes gives us this:

$ nci-file-expiry list-quarantined | grep .ice_daily.nc | while read uuid a b c d path; do echo $uuid $path; done
813ff7b2-381b-430b-a423-f32b449bf710 /scratch/v45/dr4292/20200222.ice_daily.nc
9aa317e3-7763-4af5-a19b-d07a7d6c2d90 /scratch/v45/dr4292/20200223.ice_daily.nc
28b0d00c-e9fc-4e5e-adae-39a817d0fb51 /scratch/v45/dr4292/20200224.ice_daily.nc
8a17868a-25ec-40bf-a41d-264eb3505da8 /scratch/v45/dr4292/20200225.ice_daily.nc
...

Note

There are many ways to organise columnated data in bash. The while read echo variant above is my preference. If you prefer piping to awk or cut, substitute that instead. As long as the output looks like the output above, the next bit will work.

To turn this into a “file” that batch-recover is happy to deal with, we can take advantage of process substitution. This tricks batch-recover into treating the output of the above command as if it were a file, even though nothing is ever written to disk. So by wrapping the above command in <( ... ), its output becomes, for all intents and purposes, the contents of a file.

Process substitution enables a few neat tricks. For instance, if you need to diff the output of two commands, you do not need to write the output to temporary files first, you can simply run:

$ diff <( command_1 ) <( command_2 )

You can also add an additional < operator to redirect this output to stdin. The original version of this command posted on slack and the ACCESS-Hive used this to get the output of list-quarantine into the loop.

while read uuid a b c d path; do echo $uuid $path; done < <( nci-file-expiry list-quarantined | grep .ice_daily.nc )

However, there are restrictions on using process substitution. The seek instruction cannot be used on these “files”, meaning that they can’t be used in place of structured data (e.g. netCDF). For simple things like this though, that isn’t relevant.

This is a lot to remember, so we recommend placing the following in your ~/.bashrc file:

function recover_pattern () {
    nci-file-expiry batch-recover <( nci-file-expiry list-quarantined | grep "${1}" | while read uuid a b c d path; do echo $uuid $path; done )
}

And when you log into Gadi again you’ll be able to run:

$ recover_pattern .ice_daily.nc

for the same effect. As the function argument is passed straight to grep, regular expressions can be used to search for files, e.g. recover all .ice_daily.nc files starting with 2020, 2021 and 2022

$ recover_pattern "202[012].\+ice_daily.nc"

note the pattern must be wrapped in quotes in this case. It can also be used to recover files on a specific path. For instance, to recover files in your current directory:

$ recover_pattern "${PWD}"