How to create named entities mapping?

littelG009 report abuse

I use Spacy to extract named entities from the texts. But it extracts a lot of entities represented by different words, but actually they relate to the same real-world entity. For example, President of the USA and POTUS, Great Britain and the UK, International Monetary Fund and IMF, etc. In future, I need to process entities and it is good if I will be able to stick them together. So, in the result, I need to have just a single word (or phrase) that represent a single real-world entity. How this can be done?

Answers

ShaDow_ report abuse

This is a complex task. One idea: vectorize words and then perform clustering on the word vectors (for example, using KMeans). Each cluster should represent the group of entities that have different wordings but are linked to the same real-world object. Then find the centroid of each cluster and the closest entity to the centroid. This word or phrase should be the best descriptor of the entire cluster and you can replace all other entities in the cluster by this word or phrase. Using such an approach, probably you will be able to create a mapping that you need.

KoloNuto report abuse

I'm afraid that the approach described by @ShaDow_ is applicable only when you have only several entities (probably not more than 5) and not very large quantity of the different words for the single real-world entity.

ShaDow_ report abuse

This is just an idea, I'm also not sure if it will help...

sArmzeam11 report abuse

I have a lot of text and a lot of entities. Any other ideas?

XbiTake report abuse

This task is known as entity linking. And it is actually significantly harder than entity extraction. Spacy recently started to support entity linking, but they don't have any pre-trained model, you should train your custom model for entity linking: https://spacy.io/usage/linguistic-features#entity-linking

Ameli1323 report abuse

I hope Spacy will release their pre-trained model soon since training is requiring a lot of data and people report that it takes a very long time (more than a week!).

sArmzeam11 report abuse

Thanks guys, will try to train my model, or think about how to proceed without this...

Add Answer

Need support?

Just drop us an email to ... Show more