Manage Data With Pandas Dataframe

Question

question

Aleniles

1 ●0 ●1 ●2

Manage Data With Pandas Dataframe

Hello, would be possible for you to create an example in python that populates a Pandas dataframe with the following data (4 columns, same order below) for a custom list of tickers (let's say for example 10 tickers of your choice), in a given time frame (i.e last 2 years) using DS data:

ticker, date in yyyymmdd format, last price, total return (percent change vs previous period) lagged backward one period.

The script should work on whatever timeframe is selected by the user (daily, weekly, monthly). The dataframe will be indexed on the tickers so to have all data for all tickers in one dataframe only.

The output (example with daily data) should be something like in the gif attached (so you could understand what I mean for "lagged" one period) ... here Capture.PNG, and then should continue below with the data of the second ticker, the third ... etc

I need this as a start point to understand how to download the data as well as how to handle their order (columns) and then carry out other studies.

Many thanks!

datastream-api pandas

capture.png (14.3 KiB)

Nov 02, 2021 at 08:43 PM

10 |1500

Attachments: Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Answer 1 · 2021-11-03T04:59:53Z

Jirapongse

38.1k ●71 ●35 ●53

@Aleniles

I have created an example to get the Total Return Index (RI) and Price - Trade (P) data types of the following items.

'@AAPL,@MSFT,@AMZN,@TSLA,@GOOGL'

You can use the Datastream navigator to search for items and data types.

I used the DatastreamDSWS python package and the get_data method to get historical data.

import DatastreamDSWS as DSWS
username = "username"
password = "password"
ds = DSWS.Datastream(username = username, password = password)

data = ds.get_data('@AAPL,@MSFT,@AMZN,@TSLA,@GOOGL', ['P','RI'], start='-2Y', end='0D', kind=1, freq='D')

df = data.iloc[: , data.columns.get_level_values(1) == 'P'].melt(col_level=0, ignore_index=False, value_name='P')

tmp_df = data.iloc[: , data.columns.get_level_values(1) == 'RI'].melt(col_level=0, ignore_index=False, value_name='RI')

df["RI"] = tmp_df["RI"]
df.reset_index(inplace=True)
df.Dates = df.Dates.apply(lambda x: x.replace('-',''))
df[["Instrument","Dates","P","RI"]]

The output is:

1635915400405.png (19.0 KiB)

Nov 03, 2021 at 04:59 AM

10 |1500

Attachments: Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Answer 2 · 2021-11-03T08:58:55Z

Aleniles

1 ●0 ●1 ●2

Fantastic!

I just amended a little bit the code so to have also the shift I need in the return column (last column).

Many thanks!

data = ds.get_data('@AAPL,@MSFT,@AMZN,@TSLA,@GOOGL', ['P','RI'], start='-2Y', end='0D', kind=1, freq='D')

df = data.iloc[: , data.columns.get_level_values(1) == 'P'].melt(col_level=0, ignore_index=False, value_name='P')
tmp_df = data.iloc[: , data.columns.get_level_values(1) == 'RI'].melt(col_level=0, ignore_index=False, value_name='RI')
df["RI"] = tmp_df["RI"]

# shift total return backward one period
df["CHG"]=(100*(df['RI']/df['RI'].shift(1)-1).shift(-1))

df.reset_index(inplace=True)
df.Dates = df.Dates.apply(lambda x: x.replace('-',''))
df[["Instrument","Dates","P","RI"]]
# set instrument as an index
df.set_index('Instrument',inplace=True)

df.head(10)

Nov 03, 2021 at 08:58 AM

10 |1500

Attachments: Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Answer 3 · 2021-11-03T09:18:14Z

Aleniles

1 ●0 ●1 ●2

One last question, how to adjust the code for tickers like these one ?

LHUSFRN is the price index

LHUSFRN(IN)+100 is the total return index

or

MSUSAML(MSPI) Price Index

MSUSAML(MSRI) Total Return Index

I mean how to rewrite the part below for those types of tickers ?

data = ds.get_data('LHUSFRN,@MSFT,@AMZN,@TSLA,@GOOGL', ['P','RI'], start='-2Y', end='0D', kind=1, freq='D')

Nov 03, 2021 at 09:18 AM

10 |1500

Attachments: Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Answer 4 · 2021-11-03T10:39:24Z

Jirapongse

38.1k ●71 ●35 ●53

@Aleniles

It uses different data types to get the Price Index and Total Return Index values so I use a dictionary to map fields.

fields_map = {'P':"PI", "IN+100":"RI","MSPI":"PI","MSRI":"RI"}

The code looks like this:

fields_map = {'P':"PI", "IN+100":"RI","MSPI":"PI","MSRI":"RI"}


data = ds.get_data('LHUSFRN(P), LHUSFRN(IN)+100,MSUSAML(MSPI),MSUSAML(MSRI), @AAPL(P), @AAPL(RI)', 
                   start='-2Y', end='0D', kind=1, freq='D')

#Rename the multi-index columns
column_index0 = [x.split('(')[0] for x in data.columns.get_level_values(0)]
column_index1 = [x.split('(')[1].replace(')','') for x in data.columns.get_level_values(0)]

#use the dictionary to rename data types
column_index1_rename = []
for dt in column_index1:
    if dt in fields_map:
        column_index1_rename.append(fields_map[dt])
    else:
        column_index1_rename.append(dt)


data.columns = data.columns.from_tuples(list(zip(column_index0,column_index1_rename)),names=['Instrument', 'Field'])

df = data.iloc[: , data.columns.get_level_values(1) == 'PI'].melt(col_level=0, ignore_index=False, value_name='PI')

tmp_df = data.iloc[: , data.columns.get_level_values(1) == 'RI'].melt(col_level=0, ignore_index=False, value_name='RI')

df["RI"] = tmp_df["RI"]

df.reset_index(inplace=True)

df.Dates = df.Dates.apply(lambda x: x.replace('-',''))

df[["Instrument","Dates","PI","RI"]]

The output is:

1635935833473.png (18.8 KiB)

Nov 03, 2021 at 10:39 AM

10 |1500

Attachments: Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Answer 5 · 2021-11-04T08:48:50Z

Aleniles

1 ●0 ●1 ●2

Very kind of you!

One more question: what if I need to retrieve same kind of output for a couple of formulas instead? Like the following (even by writing another separate code if it is not possible to integrate it into the previous one):

(0.6*REBE#(S&PCOMP(RI)))+(0.4*(REBE#(LHAGGBD(IN)+100)))

(REBE#(CPRD#(1+((PCH#(MSACWF$(RI),1M)*0.006)+((PCH#(LHAGGBD(IN)+100,1M))*0.004)))))-100

I mean do you think is possibile to pull out data from a formula as well or do I need to convert it into python code?

Thanks!

Nov 04, 2021 at 08:48 AM

10 |1500

Attachments: Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Jirapongse Nov 04, 2021 at 09:17 AM

@Aleniles

Those formulas return the following data.

ds.get_data('(0.6*REBE#(S&PCOMP(RI)))+(0.4*(REBE#(LHAGGBD(IN)+100))),(REBE#(CPRD#(1+((PCH#(MSACWF$(RI),1M)*0.006)+((PCH#(LHAGGBD(IN)+100,1M))*0.004)))))-100', start='-2Y', end='0D', kind=1, freq='D')

What output would you like to see?

1636017279559.png (29.4 KiB)

Q&A Forum

question

Manage Data With Pandas Dataframe

5 Answers

question

Manage Data With Pandas Dataframe

5 Answers

Related Questions