Introduction to semantics

Inside an enterprise, information that is missing or ineffectively managed can be costly, resulting in lost business opportunities or too much time wasted on the wrong activities. Here, semantics can play a key role to make sure that all enterprise information is available, especially the unstructured data.

What is semantics? Introduction to the topic

Semantics is the study of the meaning of words and sentences; at its simplest, it concerns with the relation of linguistic forms to non-linguistic concepts and mental representations in order to explain how sentences are understood by the speakers of a language.

We can start by thinking of semantics as the “magic” that happens when people communicate and, most importantly, when they understand each other. This magic is actually a well-balanced combination of:

To make sense out of a work of art, you need to combine the objective representation with your knowledge of the world; when you consider words within a context, you are able to understand the meaning and the message. That’s semantics!

Since we define what semantics is, we can understand why semantic technology is relevant for some of the most critical business activities.

Semantic technology is a way of processing content that relies on a variety of linguistic techniques including text mining, entity extraction, concept analysis, natural language processing, categorization, normalization and sentiment analysis. Compared to traditional technologies that process content as data, semantic technology is based on not just data, but the relationships between pieces of data. When it comes to analyzing text, this network enables both high precision and recall in search, and automatic categorization and tagging.

Thanks to the ability to understand the meaning of the words in context the way that humans do, semantic technology can manage a huge knowledge base to integrate information and data and allow organizations to find the information necessary for making decisions.

Information growth in terms of volume, velocity, variety and complexity, as well as in the variety of ways in which it is being used, makes its management more difficult than ever before. Here, semantics plays a key role in extracting meaning from unstructured data, transforming it into ready to use information for knowledge management, customer service, operational risk management and social media monitoring.

Semantics for Operational Risk Management

Semantic technology helps organizations manage unstructured information and transform it into usable, searchable and actionable intelligence. It uncovers data from within the organization and from the web to provide valuable insight.

Semantics for Customer Service

Managing customer experience today requires being able to streamline interactions with customers, maintaining a high level of customer satisfaction and hearing the Voice of the Customer. Semantic technologies support the implementation of advanced listening platforms, streamlining access to support, whether it is delivered directly to customers, or to support staff to help customers who need additional assistance. The key to providing efficient automated customer support is understanding the customer’s request and ensuring access to the information they need at the right time.

Semantics for Knowledge Management

External and internal sources are important resources that contain insight valuable for identifying risks and mitigating threats. To minimize operational risks and threats hiding in the supply chain and within an organization’s ecosystem, semantics can be used to support analysts in making the vast amount of content they acquire available to fuel the risk assessment process with actionable insight and intelligence. Semantic technology allows organizations to minimize their exposure to risks, and provides early identification and analysis of consumer sentiment, market trends and competitor information.

Corriere della Sera

– Lo sport più commentato su Twitter è il nuoto, seguito dalle altre discipline acquatiche (vela, pallamano e canottaggio). Al quinto posto il tennis. L’analisi di Expert System.

Alle Olimpiadi di Rio c’è chi ha vinto la medaglia d’oro, chi d’argento e chi di bronzo. Ma chi ha vinto la medaglie dei tweet?

Michael Phelps è l’atleta più citato, seguito dal giocatore nigeriano di tennis da tavolo Aruna Quadri. Al terzo posto troviamo Usain Bolt, seguito a sua volta da due giocatori di badminton, Lee Chong Wei e Lin Dan, poi dal famoso tennista Andy Murray, e ancora Or Sasson, Teddy Riner, Michael Jung, e Jospeh Schooling.
Leggi l’articolo sul Corriere della Sera

Di quali atleti si è parlato di più nel corso delle Olimpiadi di Rio del 2016? Quali sono gli sport più citati dagli utenti? E qual è stato il sentiment prevalente?

Sono queste alcune delle domande che si è posta Expert System, società leader in software semantici per la gestione strategica delle informazioni e dei big data, quotata sul mercato AIM di Borsa Italiana, nello svolgere un’analisi per identificare i principali argomenti discussi via Twitter dagli utenti online sulle Olimpiadi del 2016. L’analisi, incentrata sull’utilizzo della tecnologia cognitiva Cogito per la comprensione dei contenuti, ha interessato più di 430.000 tweet in lingua inglese postati tra il 5 e il 22 agosto.

 

I più citati: Michael Phelps e Serena Williams

Michael Phelps è l’atleta più citato, seguito dal giocatore nigeriano di tennis da tavolo Aruna Quadri. Al terzo posto troviamo Usain Bolt, seguito a sua volta da due giocatori di badminton, Lee Chong Wei e Lin Dan, poi dal famoso tennista Andy Murray, e ancora Or Sasson, Teddy Riner, Michael Jung, e Jospeh Schooling.

Tra le dieci atlete più citate, sul podio troviamo la tennista Serena Williams, la schermitrice Ibtihaj Muhammad, la ginnasta Simone Biles, seguite da Annalise Murphy, Simon Manuel, Katie Ledecky, Monica Puig, Rafaela Silvia, Ginny Thrasher, Danielle Prince.

 

Qual è il sentiment?

Dai commenti espressi dagli utenti nei confronti degli atleti prevale un mood positivo per Simone Biles (96%), Simone Manuel (86%), Katie Ledecky (86%); per gli atleti, invece, è Justin Rose (95%) il più amato, seguito da Mo Farah (84%) e terzi, a pari merito, Michael Jung e Michael Phelps (78%).

 

Nuoto primo sport sui social

Dall’analisi emerge che lo sport più chiacchierato dagli utenti di Twitter è il nuoto, subito seguito dagli altri principali sport acquatici (vela, pallamano e canottaggio). Fra gli sport più citati, infine, si conferma al quinto posto il tennis.

La tecnologia Cogito consente di identificare i significati delle parole e delle frasi per determinare anche la natura del contesto delle parole e delle costruzioni sintattiche, positive (cool) o negative (bad), e ricondurla a particolari stati d’animo. In generale, è possibile affermare che il sentiment è prevalentemente positivo (75%) non mancano però verbi e parole che richiamano un sentiment negativo (unacceptable, lose, fail, pain, penality, sad, miss, bad, useless).

 

Non solo atleti…

Ovviamente prevale sempre l’uso di verbi e nomi specifici del mondo sportivo (win, gold, medal, final, team, silver, bronze, watch, play, match, game, athletics) e sono molti i lemmi che fanno riferimento agli esiti delle gare (congrats, congratulations, thanks, champion, round, good luck, result) ma si parla soprattutto di woman, girl e man e non solo di atleti, forse per l’elevato carico di umanità che queste Olimpiadi hanno mostrato (prima squadra di rifugiati, atleti con storie significative, donne musulmane).

L’infografica completa con i risultati dell’analisi di Cogito è disponibile qui

Today, organizations and enterprises of most every kind must be able to tap into what is now an essential, yet challenging resource: big data. Composed of internally stored organizational information such as customer and sales information, transactional data, research, as well as external open source information and social media, this big data is largely unstructured and in a state of constant growth. It is also mostly text. Natural language processing (NLP) of big data is the next great opportunity.

What is big data?

No longer just a buzzword, the phrase “big data” describes the growing volume of structured and unstructured, multi-source information that is too large for traditional applications to handle. In terms of its usefulness, the 2013 book “Big Data: A Revolution That Will Transform How We Live, Work, and Think,” by Viktor Mayer-Schönberger and Kenneth Cukier refers to big data as “the ability of society to harness massive amounts of information in novel ways to produce useful insights or goods or services of significant value.”

What is NLP?

Natural language processing (NLP) is a form of artificial intelligence that helps machines “read” text by simulating the human ability to understand language. NLP techniques incorporate a variety of methods, including linguistics, semantics, statistics and machine learning to extract entities, relationships and understand context, which enables an understanding of what’s being said or written, in a comprehensive way. Rather than understanding single words or combinations of them, NLP helps computers understand sentences as they are spoken or written by a human. It uses a number of methodologies to decipher ambiguities in language, including automatic summarization, part-of-speech tagging, disambiguation, entity extraction and relations extraction, as well as disambiguation and natural language understanding and recognition.

What problems can natural language processing for big data solve?

Regardless of the sector, every business today relies on large volumes of text information. For example, a law firm works with large amounts of research, past and ongoing legal transaction documents, notes, email correspondence as well as large volumes of governmental and specialized reference information. A pharmaceutical company will have large volumes of clinical trial information and data, doctor notes, patient information and data, patent and regulatory information as well as the latest research on competitors.

Because these types of information are largely made up of language, natural language processing for big data presents an opportunity to take advantage of what is contained in especially large and growing stores of content to reveal patterns, connections and trends across disparate sources of data.

Interactions: Today, natural language processing technologies are already at work in a variety of commonly used interactive applications such as smartphone assistants like Apple’s Siri, in online banking and retail self-service tools and in some automatic translation programs. Users ask questions in everyday language and receive immediate, accurate answers. It’s a win-win for both customers, who can easily communicate with companies they do business with whenever and wherever they want, and for companies who increasingly realize savings by reducing the number of calls handled by traditional live assistance.

Business Intelligence: Natural language processing for big data can be leveraged to automatically find relevant information and/or summarize the content of documents in large volumes of information for collective insight. Users are no longer limited by having to choose or know the “right” keywords to retrieve what they’re looking for but can interact with the content via search using queries in their own words. Faster, more thorough access to information speeds up all downstream processes that depend on timely information and enable its use for real time, actionable business intelligence.

Sentiment analysis: With an increasingly online customer base, social channels are a rich, if noisy source of invaluable information. Using natural language processing for sentiment analysis, organizations can understand what is being said about their brand and products, as well as “how” it’s being talked about—how users feels about a service, product or concept/idea. This is a powerful way to discover information about the market and about current and potential customers (opinions, but also information about customer habits, preferences and needs/wants, as well as demographic information) that would otherwise remain out of reach. This information can then be applied to product development, business intelligence and market research.

If estimates by IDC come true, by 2020, we’ll be looking at around 44 trillion gigabytes of digital knowledge worldwide (an IDC Digital Universe Study reports that by 2020, for every human in the world, approximately 1.7 megabytes of new information will be created every second; that’s around 44 trillion gigabytes). Fourty four trillion gigabytes is a lot of potential. No matter where you apply it, natural language processing for big data will be an essential feature to build into your analysis pipeline in order to capture the value of this information for insight, reduced costs and increased productivity.

KDnuggest

– While I’m not a Facebook user or big fan of the platform, there is at least one area where Mark Zuckerberg and I are on the same page: Artificial Intelligence (for reference www.theinquirer.net/inquirer/news/2448174/mwc-2016-mark-zuckerberg-fears-no-killer-ais).

While it’s always inspired a bit of controversy, I think that the negative views about AI being a dangerous technology are more grounded in Hollywood than in reality, and not just because we’ve seen so many positive examples of its application (especially in biomedics). Instead, it’s a complete contradiction to think that a computer can overcome something that even humans don’t quite understand. Despite decades of work by some of our most brilliant minds, we still don’t exactly know how the brain functions, not to mention all of the mechanisms at the base of thought or comprehension.

So, Bravo, Mark!

Read the article by Marco Varone, CTO and Founder, Expert System

Come veniva percepito il ferragosto 140 anni fa? A quali temi e concetti è stato collegato nell’immaginario collettivo con il passare del tempo? Studio dell’evoluzione di questo fenomeno tramite l’analisi degli articoli pubblicati nell’archivio storico del Corriere della Sera

Expert System ha svolto tramite la tecnologia semantica Cogito un’analisi di tutti gli articoli pubblicati dal Corriere della Sera dal 1876 al 2015 riguardanti il ferragosto per cercare di capire come questo fenomeno si è evoluto nel tempo. L’idea è che da uno studio delle notizie del principale quotidiano italiano si possano cogliere trend e caratteristiche del fenomeno stesso: in un certo senso, guardando alle parole si possono ricostruire i fatti.

L’analisi ha interessato circa 22mila articoli contenenti il termine ferragosto, selezionati tra gli oltre 7 milioni e mezzo di articoli presenti nell’archivio storico con le notizie dal Corriere della Sera (1876-2015), Corriere del pomeriggio (1892-1944) e Corriere dell’Informazione (1945-1981).

Scarica il report

Text mining helps companies discover and generate new information through deep analysis and examination of large amounts of text using a variety of methodologies. The combination of text mining and visualization tools can make this process even more effective.

Text mining

Text mining essentially transforms unstructured information into structured data that can be further explored and analysed to support a number of downstream business applications. For the typical high volume repositories that businesses rely on, one of the most challenging aspects is knowing what type of information that repository contains, what it’s “about.” Text mining solutions address this through entity extraction, which extracts the entities from any type of text content and identifies the connections that exist between entities. This includes entities such as people, cities, countries, businesses, government organizations and more.

Combining text mining and visualization tools takes this idea even further to represent information with even greater clarity.

Visualization tools

Reading through a long list of elements or browsing a large amount of documents requires a long time to value for the intelligence contained within. Instead, intuitive and interactive data visualization allows decision makers to immediately grasp what the analysis reveals, and then drill down into areas of greatest interest.

Text mining and visualization tools convert documents, spreadsheets, reports, etc. into clear charts or graphs, allowing analysts to easily explore and work with data and content.

Visualization tools help companies:

 

Text mining and visualization tools: The semantic technology advantage

Cogito technology applies semantic analysis to support business activities, enabling the discovery of information with a variety of views and graphic options for visualizing, correlating and interacting with information.

The follwing screenshots demonstrate how a combined approach of  text mining and visualization tools can transform the analysis process. Deep semantic analysis ensures a complete understanding of the text to exploit the data, revealing  hidden relationships and capturing even the weakest signals present in information.
text mining and visualization tools

 

text mining and visualization tools

 

text mining and visualization tools

 

Contact us for a customized demonstration of Cogito’s text mining and visualization tools.

Research Information

– The need to organise information efficiently and reliably is more important than ever, argues Allan Gajadhar

Libraries and academies have existed since ancient times to promote and order our understanding of our world. From earliest history, myths and legends arose that provided some explanation of the natural world and its forces to early humans. Sages and thinkers in ancient cultures around the globe all produced attempts at understanding the natural and supernatural with a variety of ontological concepts. The word ontology itself has its origins in ancient philosophy as the study of being. Early Western philosophers such as Aristotle came up with some of the earliest attempts at categorising knowledge, as well as some of the earliest analysis of the basic notion of ontology, or the study of being.

Read the article by Allan Gajadhar

What is the semantic web? The semantic web was an idea of world wide web inventor Tim Berners-Lee who wanted to create a more intelligent and intuitive web. The idea was to turn the web into a single repository of information not just a vast collection of disconnected web pages.

As Berners-Lee wrote in the May 2001 issue of Scientific American, “The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

For the semantic web, meaning and understanding are foundational for enabling the sharing of data and information across businesses, scientific communities, and basically for anyone.

Today, if we want to find something on the web, we perform a search to retrieve pages or documents that match our keyword. As we know, this is not always a 100% effective process; not only do we receive more results than we can reasonably manage or evaluate, all of the results are not a match to our search. This is because the markup language —typically HTML— uses a description of the documents that, in computer language is just text without meaning.

The challenge that the semantic web seeks to address is to provide a format or structure that can help machines understand the web page data in a way that encompasses an understanding of meaning in terms of what is on a page. The semantic web applies a framework that includes a data-centric publishing language such as RDF, OWL or XML that allows meaning and structure (through new data and metadata) to be added to content in a way that is machine readable. In this way, computers can do more of the heavy lifting when it comes to search and aggregation of web information.

How is the “Semantic Web” Different?

Think of it this way: A typical web pages contains a lot of unstructured data; text, images and links to other pages. Within the text itself are numbers, dates, names, locations and facts. Search of pages that do not have semantic markup relies solely on the html description of that page. This means that it cannot tell the search engine what’s on that page, not the names it contains nor a data in a sentence.

Therefore, the information on a web page cannot be automatically related to information on another page, nor can it correlate different pieces of data about the same person across many pages. The mission of the semantic web is to make these capabilities feasible.

Searching a page with semantic markup results in a completely different experience. Here, search can access metadata descriptions to understand what a page is about—not by matching keywords—to retrieve information based on the information a page contains.

 

Slator

– Google recently unveiled what it calls a “Cloud Natural Language API”—basically a cloud-based natural language processing service. The launch was announced July 20, 2016 announcement on the search giant’s blog.

The service comprises entity recognition (e.g., categorizing words as names, locations, expressions), sentiment analysis (e.g., categorizing opinions as positive, negative, or neutral), and syntax analysis. We covered what Google was doing in the area of syntactic parsing back in May.

Read the article