Adding/Removing Files and Getting File Status with sdf add
, sdf rm
, and sdf status
Adding files with sdf add
To add a file to the data_manifest.yml
, we use sdf add
:
$ sdf add data/population_sizes.tsv
Added 1 file.
This will add an entry into the files
section in the
data_manifest.yml
with this file, including its MD5 hash.
Checking for file modifications with sdf status
To check if a file has changed, use sdf status
. Let's look at the
status of the file we just added with sdf add
:
$ sdf status
Project data status:
0 files on local and remotes (1 file only local, 0 files only remote), 1 file total.
[data]
population_sizes.tsv current 3fba1fc3 2023-09-01 10:38AM (53 seconds ago)
Since this file has not been modified since its creation, its status is current.
Now, let's imagine a pipeline runs and changes this file:
$ bash tools/computational_pipeline.sh # changes data
$ sdf status
Project data status:
0 files on local and remotes (1 file only local, 0 files only remote), 1 file total.
[data]
population_sizes.tsv changed 3fba1fc3 → 8cb9d10b 2023-09-01 10:48AM (1 second ago)
Now, the status indicates that this file has been changed.
Adding a Modified Version of Data to the Data Manifest with sdf update
If these changes are good, we can tell the Data Manifest it should update its record of this version:
$ sdf update data/population_sizes.tsv
$ sdf status
Project data status:
0 files on local and remotes (1 file only local, 0 files only remote), 1 file total.
[data]
population_sizes.tsv current 8cb9d10b 2023-09-01 10:48AM (6 minutes ago)
Removing a File from the Manifest with sdf rm
One can remove a data file entry from the Data Manifest with sdf rm
. Note that this does not remove the file; you can do this
separately with the Unix rm
command.
$ sdf rm data/population_sizes.tsv
Moving a File with sdf mv
Much like the Git subcommand git mv
, sdf
has sdf mv
subcommand
that can be used to move a file's location in the manifest and on
the file system:
$ sdf mv data/population_sizes.tsv old_data/population_sizes.tsv