question

Upvote
Accepted
50 9 12 20

Can we able to download huge size TRTH VBD files in small chunks to avoid network contention?

Some of the TRTH VBD files (Raw files) are more than 5 GB in size. The downloading of these files causes network bandwidth contention.

1. Is it possble to download the file in smal chunks (500 MB each) by curl command?

or

2. TRTH can able to split the huge size files in to max 500 MB each

tick-history-rest-api
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
Accepted
38.1k 71 35 53

Refer to Key Mechanisms regarding Streaming, you can use range request (HTTP header Range) to get small chunks of data.

For example, if we would like this file:

{
            "PackageDeliveryId": "0x05d0f50a992b2f96",
            "UserPackageId": "0x04f21a8d26059cb1",
            "SubscriptionId": "0x0400dc1d24a00cb4",
            "Name": "NAS-2017-07-31-NORMALIZEDMP-Report-1-of-1.csv.gz",
            "ReleaseDateTime": "2017-08-01T04:00:00.000Z",
            "FileSizeBytes": 776125,
            "Frequency": "Daily",
            "ContentMd5": ""
        

Its size is 776KB. We need to 200KB for each chunk. Therefore, we need to send 4 range requests. The following is the 4 curl commands.

curl -k -X GET -H "Range: bytes=0-200000" -H "Authorization: Token <token>" -o part1 https://hosted.datascopeapi.reuters.com/RestApi/v1/StandardExtractions/UserPackageDeliveries%28%270x05d0f50a992b2f96%27%29/%24value

curl -k -X GET -H "Range: bytes=200001-400000" -H "Authorization: Token <token>" -o part2 https://hosted.datascopeapi.reuters.com/RestApi/v1/StandardExtractions/UserPackageDeliveries%28%270x05d0f50a992b2f96%27%29/%24value

curl -k -X GET -H "Range: bytes=400001-600000" -H "Authorization: Token <token>" -o part3 https://hosted.datascopeapi.reuters.com/RestApi/v1/StandardExtractions/UserPackageDeliveries%28%270x05d0f50a992b2f96%27%29/%24value

curl -k -X GET -H "Range: bytes=600001-" -H "Authorization: Token <token>" -o part4 https://hosted.datascopeapi.reuters.com/RestApi/v1/StandardExtractions/UserPackageDeliveries%28%270x05d0f50a992b2f96%27%29/%24value

After that, we need to merge all four parts to a single file.

$cat part1 part2 part3 part4 > NAS-2017-07-31-NORMALIZEDMP-Report-1-of-1.csv.gz
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
50 9 12 20

Does the final merged file pass the MD5 Check?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Hi @Ayan

Yes, it should.

HTTP range requests does not alter the content of the file so the merged file should be exactly the same as original file.

Upvotes
50 9 12 20

We are facing issue while trying to gunzip the files, getting error stating "Not a gz file". But if we merger them using cat command it in a single file we are able to unzip that single file. Is there anyway to download files in chunk and the chunked files can be unzipped?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Hi @Ayan

I believe you misunderstand how range requests work.
Range request or byte serving does not repackage the content into a smaller chunk. What it does is it send a specific part of a file as requested. It does not alter or recompress the file, which means if the initial file is compressed, you have to combine every part of the file first before you can decompress it.

As for the question;

No, currently it is not possible to unzipped part of a VBD file.

Click below to post an Idea Post Idea