Skip to content

data.webtools.download_tarball

Function ยท Source

mdnc.data.webtools.download_tarball(
    user, repo, tag, asset,
    path='.', mode='auto', token=None, verbose=False
)

Download an online tarball from a Github release asset, and extract it automatically.

This tool is used for downloading the assets from github repositories. It would:

  1. Try to detect the data info in public mode;
  2. If fails (the Github repository could not be accessed), switch to private downloading mode. The private mode requires a Github OAuth token for getting access to the file.
  3. The tarball would be sent to pipeline and not get stored.

Now supports gz, bz2 or xz format, see tarfile to view the details.

Tip

The mechanics of this function is a little bit complicated. It is mainly inspired by the following codes:

Arguments

Requries

Argument Type Description
user str The Github owner name of the repository, could be an organization.
repo str The Github repository name.
tag str The Github release tag where the data is uploaded.
asset str The github asset (tarball) name (including the file name postfix) to be downloaded.
path str The extracted data root path. Should be a folder path.
mode str The mode of extraction. Could be 'gz', 'bz2', 'xz' or 'auto'. When using 'auto', the format would be guessed by the posfix of the file name in the link.
token str A given OAuth token. Only when this argument is unset, the program will try to find a token from enviornmental variables. To learn how to set the token, please refer to mdnc.data.webtools.get_token.
verbose bool A flag, whether to show the downloaded size during the web request.

Examples

Example 1
1
2
3
import mdnc

mdnc.data.webtools.download_tarball(user='cainmagi', repo='Dockerfiles', tag='xubuntu-v1.5-u20.04', asset='xconfigs-u20-04.tar.xz', path='./downloads', verbose=True)
Get xconfigs-u20-04.tar.xz: 3.06kB [00:00, 263kB/s]
Example 2
1
2
3
import mdnc

mdnc.data.webtools.download_tarball(user='cainmagi', repo='React-builder-for-static-sites', tag='0.1', asset='test-datasets-1.tar.xz', path='./downloads', token='', verbose=True)
data.webtools: A Github OAuth token is required for downloading the data in private repository. Please provide your OAuth token:
Token:****************************************
data.webtools: Tips: specify the environment variable $GITTOKEN or $GITHUB_API_TOKEN could help you skip this step.
Get test-datasets-1.tar.xz: 216B [00:00, 217kB/s]

Last update: March 14, 2021

Comments