information analyst
43.8K views | +0 today
Follow
information analyst
km, ged / edms, workflow, collaboratif
Your new post is loading...
Your new post is loading...
Rescooped by michel verstrepen from Robótica Educativa!
Scoop.it!

Operational Semantics - From Text Mining to Triplestores – The Full Semantic Circle

Operational Semantics - From Text Mining to Triplestores – The Full Semantic Circle | information analyst | Scoop.it
In the not too distant past, analysts were all searching for a “360 degree view” of their data.  Most of the time this phrase referred to integrated RDBMS data, analytics interfaces and customers. But with the onslaught…

Via Tony Agresta, Edward Chenard, juandoming
Tony Agresta's curator insight, February 15, 2015 6:22 PM

Semantic pipelines allow for the identification, extraction, classification and storage of semantic knowledge creating a knowledge base of all your data.   Most organizations have struggled to create these pipelines primarily because the plumbing hasn't existed.  But now it does. 


This post discusses how free flowing text streams into graph databases using concept extraction processes.  A well coordinated feed of data is written to the underlying graph database while updates are tracked on a continuous basis to ensure database integrity.  


Other important pipeline plumbing includes tools for disambiguation (used to resolve the definition of entities inside the text), classification of the entities, structuring relationships between entities and determining sentiment.


Organizations that deploy well functioning semantic pipelines have an added advantage over their competitors.   They have instant access to a completed knowledge base of data.  Research functions spend less time searching and more time analyzing.  Alerting notifies critical business functions to take immediate action.  Service levels are improved using accurate, well structured responses.  Sentiment is detected allowing more time to react to changing market conditions.



In general, the REST Client API calls out a GATE-based annotation pipeline and sends back enriched data in RDF form. Organizations typically customize these pipelines which consist of any GATE-developed set of text mining algorithms for scoring, machine learning, disambiguation or any of the other wide range of text mining techniques.

It is important to note that these text mining pipelines create RDF in a linear fashion and feed GraphDB™. Once the RDF is enriched in this fashion and stored in the database, these annotations can then be modified, edited or removed. This is particularly useful when integrating with Linked Open Data (LOD) sources. Updates to the database are populated automatically when the source information changes.

For example, let’s say your text mining pipeline is referencing Freebase as its Linked Open Data source for organization names. If an organization name changes or a new subsidiary is announced in Freebase, this information will be updated as reference-able metadata in GraphDB™.

In addition, this tightly-coupled integration includes a suite of enterprise-grade APIs, the core of which is the Concept Extraction API. This API consists of a Coordinator and Entity Update Feed. Here’s what they do:

  • The Concept Extraction API Coordinator module accepts annotation requests and dispatches them towards a group of Concept Extraction Workers. The Coordinator communicates with GraphDB™ in order to track changes leading to updates in each worker’s entity extractor. The API Coordinator acts as a traffic cop allowing for approved and unique entities to be inserted in GraphDB™ while preventing duplicates from taking up valuable real estate.
  • The Entity Update Feed (EUF) plugin is responsible for tracking and reporting on updates about every entity (concept) within the database that has been modified in any way (added, removed, or edited). This information is stored in the graph database and query-able via SPARQL. Reports can be run notifying a user of any and all changes.

 

Other APIs include Document Classification, Disambiguation, Machine Learning, Sentiment Analysis & Relation Extraction. Together, this complete set of technology allows for tight integration and accurate processing of text while efficiently storing resulting RDF statements in GraphDB™.

As mentioned, the value of this tightly-coupled integration is in the rich metadata and relationships which can now be derived from the underlying RDF database. It’s this metadata that powers high performance search and discovery or website applications – results are compete, accurate and instantaneous.

- See more at: http://www.ontotext.com/text-mining-triplestores-full-semantic-circle/#sthash.fg1RVcQN.dpuf

In general, the REST Client API calls out a GATE-based annotation pipeline and sends back enriched data in RDF form. Organizations typically customize these pipelines which consist of any GATE-developed set of text mining algorithms for scoring, machine learning, disambiguation or any of the other wide range of text mining techniques.

It is important to note that these text mining pipelines create RDF in a linear fashion and feed GraphDB™. Once the RDF is enriched in this fashion and stored in the database, these annotations can then be modified, edited or removed. This is particularly useful when integrating with Linked Open Data (LOD) sources. Updates to the database are populated automatically when the source information changes.

For example, let’s say your text mining pipeline is referencing Freebase as its Linked Open Data source for organization names. If an organization name changes or a new subsidiary is announced in Freebase, this information will be updated as reference-able metadata in GraphDB™.

In addition, this tightly-coupled integration includes a suite of enterprise-grade APIs, the core of which is the Concept Extraction API. This API consists of a Coordinator and Entity Update Feed. Here’s what they do:

  • The Concept Extraction API Coordinator module accepts annotation requests and dispatches them towards a group of Concept Extraction Workers. The Coordinator communicates with GraphDB™ in order to track changes leading to updates in each worker’s entity extractor. The API Coordinator acts as a traffic cop allowing for approved and unique entities to be inserted in GraphDB™ while preventing duplicates from taking up valuable real estate.
  • The Entity Update Feed (EUF) plugin is responsible for tracking and reporting on updates about every entity (concept) within the database that has been modified in any way (added, removed, or edited). This information is stored in the graph database and query-able via SPARQL. Reports can be run notifying a user of any and all changes.

 

Other APIs include Document Classification, Disambiguation, Machine Learning, Sentiment Analysis & Relation Extraction. Together, this complete set of technology allows for tight integration and accurate processing of text while efficiently storing resulting RDF statements in GraphDB™.

As mentioned, the value of this tightly-coupled integration is in the rich metadata and relationships which can now be derived from the underlying RDF database. It’s this metadata that powers high performance search and discovery or website applications – results are compete, accurate and instantaneous.

- See more at: http://www.ontotext.com/text-mining-triplestores-full-semantic-circle/#sthash.fg1RVcQN.dpuf
Rescooped by michel verstrepen from visual data
Scoop.it!

Lessons from Visualized: Cutting Through Hyperbole With Data Visualization

Lessons from Visualized: Cutting Through Hyperbole With Data Visualization | information analyst | Scoop.it

At the Visualized conference on November 9th, Neil Halloran posed an interesting question: Can DataViz lead to a data savvy society in the same way that the printing press lead to a literate one? One that is prepared to make tough decisions on complex issues?

Neil Halloran thinks so. That’s why he created VisualBudget.org to cut through hyperbole surrounding the what may be the most frequently misunderstood and pressing issue facing Americans today, our massive $16 trillion dollar deficit.

But how is a modern citizen supposed to make an informed decision on issues of tremendous scope and complexity, such as the fiscal cliff or the growing budget deficit without falling back on sound bites and punditry? Neil Halloran’s solution is to tell a story. Rather than simply presenting a static infographic or a set of tabular data on federal receipts and expenditures, VisualBudget.org takes you on a interactive tour...


Via Lauren Moss
No comment yet.
Rescooped by michel verstrepen from Presentation Tools
Scoop.it!

Time Series Visualization: Cube

Time Series Visualization: Cube | information analyst | Scoop.it

Cube is an open-source system for visualizing time series data, built on MongoDB, Node and D3.

 

If you send Cube timestamped events (with optional structured data), you can easily build realtime visualizations of aggregate metrics for internal dashboards. (Cube has a handful of chart types that you can assemble into dashboards.)

 

For example, you might use Cube to monitor traffic to your website, counting the number of requests in 5-minute intervals.

 

Video demo (60"): http://www.youtube.com/watch?feature=player_embedded&v=oq0qEu1dDdA 

 

Documentation: https://github.com/square/cube/wiki 

 

Source code: https://github.com/square/cube 

 

Find out more: http://square.github.com/cube/ 


Via Robin Good
No comment yet.
Rescooped by michel verstrepen from visual data
Scoop.it!

How Does Data Visualization Work?

How Does Data Visualization Work? | information analyst | Scoop.it

Data visualization is an amazing tool. The data we deal with daily would be almost entirely inaccessible when locked up in numerical formats. Luckily, data visualization can help us to extract information, insights, or even knowledge from that data. It relies on the remarkable human visual system that turns visible light into meaningful semantics that inform our decisions.

 

More details on the human visual system, visual metaphors, visual context, exploration and presentation at the article link.


Via Lauren Moss
Gordon Shupe's curator insight, November 18, 2013 9:53 AM

Excellent overview - this is a resource to save and use!

Rescooped by michel verstrepen from visual data
Scoop.it!

Interaction Design for Data Visualizations

Interaction Design for Data Visualizations | information analyst | Scoop.it
Interactive data visualizations are an exciting way to engage and inform large audiences. They enable users to focus on interesting parts and details, to customize the content and even the graphical form, and to explore large amounts of data. At their best, they facilitate a playful experience that is way more engaging than static infographics or videos.

Several ideas and concepts of interaction design for data visualizations are presented in this post, using 11 examples from the web. The overall concepts featured include:

The Basics: Highlighting and Details on Demand Making More Data Accessible: User-driven Content Selection Showing Data in Different Ways: Multiple Coordinated Visualizations Showing Data in Different Ways: User-driven Visual Mapping Changes Integrating Users’ Viewpoints and Opinions

Visit the complete article for numerous links, useful visuals and specific details on how to understand, implement and evaluate interactive design elements used in data visualization design.


Via Lauren Moss
Hans's comment, October 2, 2012 5:09 AM
Great article, I really like the idea of interactive information with details on demand. As an interaction designer I always try to balance content management and usability. Here are some interesting examples that made me consider a complete information surface vs a deep level on demand architecture.
JongWon Kim's comment, October 9, 2012 10:39 PM
really beautiful!!