question

Upvotes
Accepted
25 2 1 5

How to Get Data from AWS instead of TRTH server using REST API

aws-request-code.txt

I am using attached code to request Tick History Raw data where i have specified "X-Direct-Download":"true", and "X-Client-Session-Id":"Direct AWS" in the request headers, even then the data is getting downloaded from TRTH server and not AWS server.

Could anyone help me identifying what i am doing wrong in the attached code?

tick-history-rest-api
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

21 Answers

Upvotes
Accepted
13.7k 26 8 12

@pj4, I see you are using our TRTH_OnDemand_IntradayBars Python sample, which you have modified. That code will download from AWS, if variable useAws is set to True:

useAws = True

In your code the value for variable useAws is not set, but the variable is used to define if the download is to be from AWS. Please try setting useAws = True at the start of your code.

AWS download: when is the additional header required ?

If you want to download the data from AWS, the request to download it requires an additional header:

"X-Direct-Download":"true"

This header only serves in the call to retrieve the data, it is useless and ignored in all other calls.

Therefore it can be removed from calls r1 (line 4, token request), r2 (line 31, extraction request), r3 (line 71, status poll). The only place where it is useful is in call r5 (line 112, data retrieval request).

See this article for more details on AWS download.

A few other comments on your code

  • X-Client-Session-Id is required for us to trace specific calls to debug. It has no influence on AWS, and to be useful should have a unique new value every single time it is used.
  • Why set the user-agent in the headers (lines 113 & 123) ? It is not required.
  • time.sleep(3) is not required (line 129)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
25 2 1 5

Thanks for your response. I was actually setting the "useAWS = True" at the start of the code (outside of any function like request data) and now, i have removed the unneccessary lines also as you suggested but still while downloading data, i got the below message in console (which shows data got downloaded from TRTH and not AWS):

"Content response headers (TRTH server): type: text/plain - encoding: gzip"

Further, i have encountered one more issue (might be specific to the instrument) that when i tried to download FID 70 (sttlement data) for Futures "ED" (EuroDollar) from 1 Jan 2000 to 31 May 2018, i got the errors in python console (attached for reference).error-while-downloading-data-for-ed.txterro-while-extracting-gzip-file.png

The Gzip got created but seems that it is not having complete data by looking its size and also not getting extracted by Zip Genius tool (attached the snapshot of the error)


icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Hi @pj4,

The variable in Python is case sensitive. Please ensure that the setting is "useAws = True", not "useAWS = True".

@pj4, looking at the last lines of the code you posted, from the if condition, if you see "Content response headers (TRTH server): ...", instead of "Content response headers (AWS server): ..." it means that the value of useAws is false, and probably is when the headers were set for the last call (r5).

Upvotes
25 2 1 5

Yes, i am already using "useAws = True" but still the issue persists

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Can you share the whole code, so I can try the code? Please ensure that you DSS username and password is removed from the code.

Upvotes
25 2 1 5

Yes Christiaan, you were right, the useAws is getting set to False before r5, now i am able to get data from AWS.

Thanks for your help in sorting this out.

However, do you any idea about the other issue/error which i got while downloading data for Futures "ED" (details in previous responses)?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

@pj4, to test I would need the latest version of your code, with the exact request you made for ED.

Upvotes
25 2 1 5

thr-timezonemapped-integrated-dataonly-aws1.txtthr-timezonemapped-integrated-modules-aws1.txt

Attached are the two code files: File ending with "Data Only-aws1" is the main file which call data request function in the other file.

The loop on i in "Data Only" file is on instrument where the first instrument was "ED".

Do let me know should you need any other clarification regarding code since the code is quite clunky. Thanks for your help.


icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

@pj4, 2 files are missing for me to run the code: FilePath&Names.xlsx and TradingHours.csv (those you used when you ran into the error error-while-downloading-data-for-ed.txterro-while-extracting-gzip-file.png)

Upvotes
25 2 1 5

excel-files-used-in-code-content-pasted-in-this-te.txt

Apologies for that. Can'r upload excel files, that's why attached the text file with the same content as i have in excel files.

Hope this helps.

Otherwise, if you could just try to download data for ED (all 120 contracts) from 1 Jan 2000 to 31 May 2018 for FID 70 from Tick History Raw at your end and see if it is working or not?


icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

@pj4, I'm struggling to run your code. Manually recreated CSV and XLS files. Had to guess that paths had to be entered in one of the XLS. Changed lines 375 & 376 of "Modules" code (missing parenthesis).

Now I am stuck: missing file MappingTables.xlsx

Can you post it (zipped) ? Ideally, post all required XLSX and CSV files in a single ZIP.

Upvotes
13.7k 26 8 12

@pj4, studying your code, I see that at the end you attempt to uncompress all the data, and save it into a string. Considering the number of lines you mention (more than a million), this is bound to fail.

Please set maxLines = 10 and run it again; does it still fail ?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
25 2 1 5

filesusedincode.zip

Attached are the files used in code. Apologies for missing some files earlier.

Would try to change the maxLines to 0 and would let you know if it works for ED


filesusedincode.zip (23.2 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
25 2 1 5

Yes, with maxLines to 10, the code worked for ED. Thanks for the help.

However, i ran the code for multiple instruments in a loop. After running for 3 instruments, the code threw an error while running for the last FID (FID - 825) of the 4th instrument (AUL). Attached is the error message, in case you have any idea about it. Is it linked to the "location" in the request headers?

error-message.txt


error-message.txt (1.2 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

The error suggests that it cannot get "location" from the header.

Have you checked the response status code? Is it 202?

Could you try printing the entire response?

Upvotes
13.7k 26 8 12

@pj4, glad we found the issue related to the maxLines parameter.

To the error message:

Our Python sample was updated a month ago, with slightly modified code in section labeled "Step 3" (lines 132 and following), to cater for the (rarely occurring) case where data is returned directly instead of an HTTP status 202 with a location URL. The change also caters for eventual errors. It looks like you started off with the older version of this sample.

In your code, try replacing your lines 339 - 360:

  requestUrl = r2.headers["location"]
  requestHeaders={
      "Prefer":"respond-async",
      "Content-Type":"application/json",
      "Authorization":"token " + token
  }
  r3 = requests.get(requestUrl,headers=requestHeaders)

  while (r3.status_code == 202):
      print ('As we received a 202, we wait 900 seconds, then poll again (until we receive a 200)')
      time.sleep(900)
      r3 = requests.get(requestUrl,headers=requestHeaders)
  
  if r3.status_code == 200 :
      r3Json = json.loads(r3.text.encode('ascii', 'ignore'))
      jobId = r3Json["JobId"]
  
  if r3.status_code != 200 :
      print ('An error occured. Try to run this cell again. If it fails, re-run the previous cell.\n')

with the following:

  status_code = r2.status_code
  print ("HTTP status of the response: " + str(status_code))
  if status_code == 202 :
      requestUrl = r2.headers["location"]
      requestHeaders={
          "Prefer":"respond-async",
          "Content-Type":"application/json",
          "Authorization":"token " + token
      }

  while (status_code == 202):
      print ('As we received a 202, we wait 900 seconds, then poll again (until we receive a 200)')
    time.sleep(900)
    r3 = requests.get(requestUrl,headers=requestHeaders)
    status_code = r3.status_code
    print ("HTTP status of the response: " + str(status_code))

  if status_code == 200 :
      r3Json = json.loads(r3.text.encode('ascii', 'ignore'))
      jobId = r3Json["JobId"]

  if status_code != 200 :
      print ('An error occurred. Try to run this cell again. If it fails, re-run the previous cell.\n')

This will make your code more robust, it will cater for rare cases as well. The error you observed should no longer occur.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
25 2 1 5

Hi Christiaan, the updated you shared worked well. However, while iterating on instruments, on the 4th instrument, i got the attached error while downloading the 2nd field (FID 64) data of that instrument. Surprisingly, the code worked well till the 1st field (FID 70) of the instrument.

Any idea about this "SSL error"?ssl-error.txt


ssl-error.txt (2.0 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

@pj4, are you using Python 2.6, 2.7 or 3 ? This issue seems unrelated to our API, and more to the Python requests library. Please see this thread and especially this (long) thread, which might help. If you post the latest version of your code I could test it here and see if I run into the same error.

Upvotes
25 2 1 5

I am using Python 2.7.13. To further add, i got one more technical issue "Connection Aborted" while downloading data. And don't know if it is a coincidence that i got the error while the code was downloading data of the 4th instrument in the list (and exactly at the same FID 64), same number at which i got the Bad Handshake error yesterday. However, the instrument was different on both days as i changed the instrument list.

Attached is the error message that i got it today. connection-aborted-error.txtIs it also related to the request library?


icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

@pj4, the error message was generated by the requests library, but can have many causes as a quick search on the net reveals. I suggest you try to change the order of the instruments and FIDs to see if there is a correlation between the error and a specific instrument, FID on position in the list.

Maybe also upgrade to Python 3, based on the info in the threads I sent in my previous comment ?

Upvotes
25 2 1 5

Hi Christiaan, the issue was not linked to the particular instrument or FID sequence as i was able to download many more instrument successfully (but not sure why it happened). However, i am currently stuck due to below issue:

any idea when we get?

"HTTP status of the response: 400"

"local variable 'jobId' referenced before assignment"

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

@pj4, HTTP status codes returned by our servers are documented on this page. An HTTP status 400 indicates a bad request. Can you post the request that generated the error ?

Upvotes
25 2 1 5

attached is the latest request code i am using: data-request.txt. And i am getting this error even when i tried different instruments (LH, FC) to download. Do let me know if you need anything else?


data-request.txt (5.2 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
13.7k 26 8 12

@pj4,

Analysis of the messages you get:

  • "HTTP status of the response: 400"
  • "local variable 'jobId' referenced before assignment"

The r2 request generated an HTTP status 400 (bad request). Due to that, the JobId variable was not assigned, but later on was referred to in request r5.

The HTTP 400 is due to a badly formatted r2 request body. This is probably caused by the content of the XLS parameter files used by the code to generate the request body. To analyze what went wrong here, you need to save the request body, and then we can analyze the one that generated the HTTP 400 error.

Looking at the code I see the final request r5 for data retrieval is made in all cases. This requires correcting, because it should only be done if the preceding HTTP status was 200 and the JobId variable assigned.

I have inserted the latest code you sent into the master code on my PC, and am now running it with a few debugging traces added. If I manage to reproduce the error maybe I'll find the culprit of the 400.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Thanks Christiaan for looking into this. Really appreciate it.

@pj4, as an additional suggestion, considering the time it typically takes to retrieve the type of data you are requesting, you could reduce the polling time from 900 (15 minutes) to 300 (5 minutes). The entire run time of the whole program will benefit from that, without unduly loading the servers.

Upvotes
25 2 1 5

i always get 400 error while downloading data for base RICs: LH and LC. So, if you could try these RICs at your end.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

@pj4, your code is still running on my PC, I cannot change anything now, and considering how long it takes to run, will not be able to change anything more today.

I suggest you edit your code to save each r2 request body, and post the request body that generated the HTTP 400 error, so we can analyze it.

Upvotes
25 2 1 5

Sure Christiaan, i would change the wait time from 900 to 300 sec. I was of the impression that polling to check the request status also load the servers, that's why i changed it to 900 earlier.

Also, by checking the request body of r2, i got to know the issue with LC, LH and FC RICs. So, now i am able to download the data now for it as well. Thank you very much for all your help.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
13.7k 26 8 12

@pj4, glad to hear you solved it :-)

I agree fully with your comment on the polling wait time, and appreciate the fact that you are considerate about the server load. It is true that every single request is a load on the servers.

There is a trade-off to make between placing an unnecessary burden on the server and optimizing run times. I ran your sample with the parameter files you sent, for 5 FIDs and 10 base RICs. In all it took more than 15 hours to run. The average time was 11.6 minutes, fastest was 5 minutes, slowest was 30 minutes. 50% of all extractions took between 6 and 8 minutes. In these circumstances, a polling time of 5 minutes seems more appropriate than one of 15 minutes, it allows your program to terminate significantly faster without adding undue burden on our servers.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
25 2 1 5

Thanks for analyzing the code/requests/time taken in this much detail. and providing insights out of it.

One more issue that i have faced many times is below:

Error Message: File "THR_TimeZoneMapped_Integrated_Modules_AWS1.py", line 458, in requestCode shutil.copyfileobj(rr, fd, chunk_size) IOError: [Errno 22] Invalid argument

The line 458 is: shutil.copyfileobj(rr, fd, chunk_size)

It seems that the issue is while writing the data into file. Any idea why we get it?

While trying to extract the downloaded gzip file, it got extracted but with some error. And in the extracted file the last element seems some incorrect timestamp (snapshot attached) file incorrect-timestamp.png


icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
13.7k 26 8 12

@pj4, I just looked at 10 (out of 50) of the files I downloaded using your code. All zips opened correctly, and all CSVs inside ended correctly, no issues. I also did not see that error when running the code.

The code I have to retrieve and save the gzipped files is this (and useAws is true):

  r5 = requests.get(requestUrl,headers=requestHeaders,stream=True)
  r5.raw.decode_content = False
...
  fileName = filePath0 + dateforfilename+".csv.gz"
  chunk_size = 1024
  rr = r5.raw
  with open(fileName, 'wb') as fd:
      shutil.copyfileobj(rr, fd, chunk_size)
  fd.close

I have less code lines than you, there must be a difference between our codes. Can you post your latest THR_TimeZoneMapped_Integrated_Modules_AWS1.py code please, with the associated params.xlsx ?

Please also post a GZIP file that exhibits the issue. Attachments are limited to 500KB; if your file is larger (but smaller than 10MB) send it to me (christiaan.meihsl@tr.com) directly.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
25 2 1 5

I tried to send you the mail since the gzip file size was more than 500KB but it didn't get delivered due to "file size violation" issue.

However, i attached the latest code and params files here for your reference. 2415-code-and-params.zip


icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

@pj4, I will try running your new code and see what happens.

@pj4, looking at the latest code you sent I see the final request r5 for data retrieval is still made in all cases. This requires correcting, because it should only be done if the preceding HTTP status was 200 and the JobId variable assigned.

@pj4, I have been running your latest code with the latest params file on my PC since midday, with a few debugging traces added. Up to now 4 data files were produced (LC fids 70, 64, 54, 15). There were no errors, the gzip files are healthy, the CSV extracts fine and its last timestamp is correct. I will be out of office tomorrow, so I'll come back to you with the other results on Monday.

@pj4, I ran your latest code with your params file. My PC rebooted for an update in my absence, so I could not see the console output, but it seems to have worked fine: all gzips were created and can be opened, CSV files can be extracted, their content looks fine. I opened the gzip files using winrar.

Click below to post an Idea Post Idea