question

Upvotes
Accepted
31 1 1 5

No "exact" and "offset" keys for _typeGroup : "entities"

Several types of named entities (specifically, organizations and companies) get tagged as belonging to _typeGroup : "socialTag" rather than _typeGroup : "entities". The structure of "socialTag" group presupposes linking its members to URLs rather than giving exact position in text:

_typeGroup : "socialTag"
id : "http://d.opencalais.com/..."
socialTag : "http://d.opencalais.com/..."
forenduserdisplay : "true"
name : "Goodwill Industries"
importance : "1"
originalValue : "Goodwill Industries"

This format of output (with no offsets specified) doesn't allow to map the extracted entity to the text.

Do you happen to know if there is a way to get offsets for such entities?

intelligent-tagging-apiintelligent-taggingopen-calais-apisemantic-metadata-tagging
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvote
Accepted
1.2k 6 10 8

There's no offset because social, topic, and industry tags describe what the input document is about as a whole rather than identifying specific entities in text.

From the API user guide:

A Social Tag is an association of the submitted text to related Wikipedia categories, or articles. Social tags attempt to emulate how a person would tag a specific piece of content.

For example, if you submit a story about President Barack Obama and a piece of legislation, at least one reasonable tag would be “U.S. Legislation.” A story about the relative merits of BMWs, Ferraris, and Porsches would probably be tagged with “sports cars,” “luxury makes,” “auto racing,” and “motorsport.” The story about the Apple Watch Launch generated the following social tags: IOS, Smartwatches, Wearable Computers, Human-computer interaction, Ubiquitous computing, Consumer electronics, Apple Inc., Wearable Technology, and Apple system on a chip.

The SocialTag function does not identify individual items within the text, but rather attempts to provide common sense tags for the piece of content as a whole. Social tags are derived from the Wikipedia folksonomy.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Dear @Tomasz Adamusiak, thank you for your answer. I understand the reason why there are no offsets for _typeGroup "socialTag" and alike.

My reasoning is that the entities that fall under such categories belong to _typeGroup "entities" at the same time, too. Without offsets, these entities cannot be extracted from the text, which results into false negatives for the cases when we need such tokens (i.e., the ones from _typeGroup "socialTag") to be detected and tagged as named entities.

Do you happen to have an idea of how this issue can be resolved?

@tetiana.myronivska I'm not sure I understand. A social tag and an instance/entity tag are two separate things even if there's an overlap in token/label.

Could your provide an example of the false negatives?

Upvotes
31 1 1 5

@Tomasz Adamusiak, here is an example. Fo the input sentence

"We want you to know why your support of Goodwill is so important . " we have "Goodwill" detected by OpenCalais in the following way:

http://d.opencalais.com/dochash-1/bc75a003-4d8d-3215-b5ed-881cb2dfac96/SocialTag/1
_typeGroup : "socialTag"
id : "http://d.opencalais.com/dochash-1/bc75a003-4d8d-3215-b5ed-881cb2dfac96/SocialTag/1"
socialTag : "http://d.opencalais.com/genericHasher-1/501ba8d5-c75c-3e13-bdfd-4a76ac225f73"
forenduserdisplay : "true"
name : "Goodwill Industries"
importance : "1"
originalValue : "Goodwill Industries"

This is the only mention of "Goodwill" I get in the JSON response.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

For all intents and purposes no instance of Goodwill Industries was identified in this example and you can ignore all social/aboutness tagging if you're looking for named entity recognition.

Oh, I see. Thank you, Tomaz.

Oh, I see. Thank you.



Click below to post an Idea Post Idea