For a deeper look into our DataScope Select SOAP API, look into:

Overview |  Quickstart |  Documentation |  Downloads |  Tutorials

question

Upvotes
Accepted

DSS API - News Sentiment - duplicate news?

Guys,

I'm using the following sample request body message to get news sentiment for Apple:

{
  "ExtractionRequest": {
    "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.NewsAnalyticsExtractionRequest",
    "ContentFieldNames":
      ["Headline", "Story Body", "Story Date Time", "Take Date Time", "Created Date", "Novelty Timestamp",
       "Attribution", "Products", "Topics", "Language",
       "Relevance", "Sentiment", "Sentiment - Negative", "Sentiment - Neutral", "Sentiment - Positive"],
    "IdentifierList": {
      "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.InstrumentIdentifierList", 
      "InstrumentIdentifiers": [
      { "Identifier": "AAPL.O", "IdentifierType": "Ric" }
      ],
      "ValidationOptions": {"AllowHistoricalInstruments": true},
      "UseUserPreferencesForValidationOptions": false
    },
    "Condition": {
      "ReportDateRange": "Range",
      "QueryStartDate": "2019-01-01",
      "QueryEndDate": "2019-04-30",
      "NewsAnalyticsRelevanceOperator": "GreaterThanOrEqualTo",
      "NewsRelevanceValue": 0.5,
      "NewsAnalyticsPrevailingSentiment": "Positive",
      "NewsAnalyticsNovelty": "Novelty7Day",
      "NewsFilterNoveltyOperator": "GreaterThanOrEqualTo",
      "NewsNoveltyValue": 10,
      "NewsTopicsCodes":[ "CMPNY","TECH" ],
      "IncludeImbalace": true,
      "NewsAnalyticsSource": "ArticlesAndAlerts",
      "NewsItemsSource":"Selected",
      "NewsAttributionsCodes": [ "RTRS", "BSW" ]
    }
  }
}

However, it seems some news record are the same, is there a way to distinguish/filter out duplicated news records?

e.g. the below 2 records in the response seems to be duplicated, what's the way to determine such duplicated news? Headline is the same, however, date time is different, sentiment being the same, maybe because the news content is the same?

{
            "IdentifierType": "Ric",
            "Identifier": "AAPL.O",
            "Headline": "U.S. RESEARCH ROUNDUP- Apple, Merck & Co, Voya Financial",
            "Story Body": null,
            "Story Date Time": "03/26/2019 18:15:45",
            "Take Date Time": "03/26/2019 18:15:45",
            "Created Date": "03/26/2019 18:15:58",
            "Novelty Timestamp": "03/26/2019 18:15:45",
            "Attribution": "RTRS",
            "Products": "E U NAW C SOF PSC",
            "Topics": "RCH US REP BLR CMPNY RETE FOOD1 AMED MRCH PHAG COMS SOFW MDIA RSPC CYCS SHOP SHOPAL NCYC FOBE FOTB HECA HLTH HPRD PHMR PHAR TECH TEEQ TMT SWIT CCOS AMERS LEN RTRS",
            "Language": "EN",
            "Relevance": 1,
            "Sentiment": 1,
            "Sentiment - Negative": 0.0581011,
            "Sentiment - Neutral": 0.152388,
            "Sentiment - Positive": 0.789511
        },
        {
            "IdentifierType": "Ric",
            "Identifier": "AAPL.O",
            "Headline": "U.S. RESEARCH ROUNDUP- Apple, Merck & Co, Voya Financial",
            "Story Body": null,
            "Story Date Time": "03/26/2019 22:08:27",
            "Take Date Time": "03/26/2019 22:08:27",
            "Created Date": "03/26/2019 22:08:41",
            "Novelty Timestamp": "03/26/2019 22:08:27",
            "Attribution": "RTRS",
            "Products": "E U NAW C SOF PSC",
            "Topics": "RCH US REP BLR CMPNY RETE FOOD1 AMED MRCH PHAG COMS SOFW MDIA RSPC CYCS SHOP SHOPAL NCYC FOBE FOTB HECA HLTH HPRD PHMR PHAR TECH TEEQ TMT SWIT CCOS AMERS LEN RTRS",
            "Language": "EN",
            "Relevance": 1,
            "Sentiment": 1,
            "Sentiment - Negative": 0.0581011,
            "Sentiment - Neutral": 0.152388,
            "Sentiment - Positive": 0.789511
        }
dss-rest-apidatascope-selectdss
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvote
Accepted
13.7k 26 8 12

@Liang.Xue,

There are 2 possible causes for what appears to be a duplicate result:

  1. Your request does not retrieve all possible content field names. There is probably at least 1 or maybe more fields that are not part of your request but that have changed between 18:15 and 22:08 on that day. If that is the case then you will see what appears to be duplicate records.
  2. It could also be that a specific alert was published twice.

To verify this hypothesis I made the same request as you, but limited the date range to 26-27 March 2019, and retrieving all possible content field names. I then identified and compared those 2 records. The following fields all changed content:

Body Size, Created Date, Item Count N - Same Feed, Item ID, Linked Count N - Same Feed, Linked Id Prv N - Same Feed, Novelty Timestamp, Related RICs, Story Date Time, Take Date Time, Total Sentences, Total Words, Update Size

This shows these records are not duplicates.

That said, your query is very much data related. This forum is aimed at software developers using Refinitiv APIs. The moderators on this forum do not have the deep expertise in all the content sets available through our products required to answer your question. I advise you to call the Refinitiv Helpdesk number in your country. They will either have the answer for you right away, or will reach out to the content experts who can provide the answer you're looking for.

PS: here is the request body I used for my test:

{
  "ExtractionRequest": {
    "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.NewsAnalyticsExtractionRequest",
    "ContentFieldNames":
      ["Asset ID", "Asset Store Version", "Attribution", "Author", "Body Size", "Broker Action", "Broker Action Code", "CIN Code", "Common Code", "Created Date", "Cross Reference", "CUSIP", "Engine Version", "Event Type Code", "Exchange Code", "Exchange Description", "Feed ID", "File Code", "First Mention", "Fitch Issuer ID", "Headline", "Headline SubType", "Headline SubType Code", "Instrument ID", "Instrument ID Type", "Is Breaking", "ISIN", "Issue PermID", "Issuer OrgID", "Issuer PermID", "Item Count 1 - Same Feed", "Item Count 2 - Same Feed", "Item Count 3 - Same Feed", "Item Count 4 - Same Feed", "Item Count 5 - Same Feed", "Item Genre", "Item ID", "Item Type", "Language", "Linked Count 1 - Same Feed", "Linked Count 2 - Same Feed", "Linked Count 3 - Same Feed", "Linked Count 4 - Same Feed", "Linked Count 5 - Same Feed", "Linked Id Last 1 - Same Feed", "Linked Id Last 2 - Same Feed", "Linked Id Last 3 - Same Feed", "Linked Id Last 4 - Same Feed", "Linked Id Last 5 - Same Feed", "Linked Id Prv 1 - Same Feed", "Linked Id Prv 2 - Same Feed", "Linked Id Prv 3 - Same Feed", "Linked Id Prv 4 - Same Feed", "Linked Id Prv 5 - Same Feed", "Market Commentary", "Market Commentary Code", "Metadata", "Moodys Issuer ID", "More News", "Named Items", "News Source", "Novelty Timestamp", "Number of Companies", "OPOL", "PE Code", "PILC", "PNAC", "Product Permission", "Products", "Quote ID", "Quote PermID", "RCP ID", "Reference", "Related RICs", "Relevance", "Reuters Editorial RIC", "RIC", "S&P Issuer ID", "SEDOL", "Sentiment", "Sentiment - Negative", "Sentiment - Neutral", "Sentiment - Positive", "Sentiment Bearing", "Sentiment Words", "SICC", "Stock RIC", "Story Body", "Story Date Time", "Story Type Code", "Take Date Time", "Take Sequence Number", "Ticker", "Topics", "Total Sentences", "Total Words", "TRBC Activity Code", "TRBC Activity Code Description", "TRBC Business Sector Code", "TRBC Business Sector Code Description", "TRBC Economic Sector Code", "TRBC Economic Sector Code Description", "TRBC Industry Code", "TRBC Industry Code Description", "TRBC Industry Group Code", "TRBC Industry Group Code Description", "Update Size", "User Defined Identifier", "User Defined Identifier2", "User Defined Identifier3", "User Defined Identifier4", "User Defined Identifier5", "User Defined Identifier6", "Valoren", "Wertpapier" ],
    "IdentifierList": {
      "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.InstrumentIdentifierList", 
      "InstrumentIdentifiers": [
      { "Identifier": "AAPL.O", "IdentifierType": "Ric" }
      ],
      "ValidationOptions": {"AllowHistoricalInstruments": true},
      "UseUserPreferencesForValidationOptions": false
    },
    "Condition": {
      "ReportDateRange": "Range",
      "QueryStartDate": "2019-03-26",
      "QueryEndDate": "2019-03-27",
      "NewsAnalyticsRelevanceOperator": "GreaterThanOrEqualTo",
      "NewsRelevanceValue": 0.5,
      "NewsAnalyticsPrevailingSentiment": "Positive",
      "NewsAnalyticsNovelty": "Novelty7Day",
      "NewsFilterNoveltyOperator": "GreaterThanOrEqualTo",
      "NewsNoveltyValue": 10,
      "NewsTopicsCodes":[ "CMPNY", "TECH" ],
      "IncludeImbalace": true,
      "NewsAnalyticsSource": "ArticlesAndAlerts",
      "NewsItemsSource":"Selected",
      "NewsAttributionsCodes": [ "RTRS", "BSW" ]
    }
  }
}

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes

THanks, Christiaan!

Good to have your support here, after the training webex last week hosted by Richard.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Click below to post an Idea Post Idea