デモシステム

Mission

 right, around

Web is a new frontier of human beings in 21st century. Web mining is distinguished from traditional information retrieval by the characteristics of link-structure, semi-structure and dynamic progress. Deep analysis of such huge amount of web data and development of practical system is our goal.

High quality contents-mining with link information and semi-structure of Web data

 right, around

The purpose of this project is semi-automatic construct knowledge base from web. For this purpose, we considered 4 steps: discovery, collection, extraction, and integration.

Results

[Discovery]

Repetition pattern discovery algorithms, automatic wrapper generation using discovered pattern, are proposed.

[Collection]

A smart topic crawler is developed. We implemented a smart link selection strategy to reach topic pages as fast as possible, and also implemented replaceable topic judge modules to increase precision. Further more, the crawler can learn topic words and link structure.

[Extraction]

Record extraction algorithms from a series of web documents are developed. A series of web documents have same appearance when a user views them with a browse, such as recipes, real estates, hotels.

[Integration]

An XML Database system is developed. It can manage various structured data.

[Application]

For concrete application, we try to construct the Japanese web syllabi DB.

demo systems:

Data Analysis Tools : Data Matrix

 right, around

Data matrix is one of facet analysis tools for structured documents such as XML data. Traditional IR(information retrieval) system has only one axis for results listing. For example, Google returns ranked list of Web pages for user's query words. On the other hand, this matrix system has two axes for listing. User not only enter query words, but also select 2 attributes from structure elements of data. The system retrieves document files which are related query words, but it doesn't search whole document, only searches selected attribute parts of document. After that, the system clusters retrieved documents along with 2 attributes, and maps documents into matrix. Further more, the system is implemented query expansion function. By clicking [zoom] button, you can easy to zooming your query. This system is powerful for complicated information retrieve, such as patent data search, financial analysis.

(This system is powered by GETA. http://geta.ex.nii.ac.jp/)

Data Analysis Tools : Concept Graph

 right, around

The Concept Graph shows dependency of terms in a given document set. The system uses traditional IR techniques at first. Given a document set D, the system scans all documents, picks up terms from each document, and counts frequency of each terms. And then, the system calculates dependency of each terms according to dependency definition. The Concept Graph is powerful for understanding relation between terms, and for discovering hidden relations...

A music recommendation system using playlists

 right, around

It is possible to assume that songs or artists in a playlist have some close relationship, because playlist creator may select songs along with a theme. We focus on playlists and use them as the data minig resources for a music recommendation system. We have retrieved about 13,000 playlists, and analyzed the frequency of artists/songs and the co-occurrence of artists/songs in the playlists. We developed a prototype of music recommendation system as an application of Concept Graph. The concept-graph shows dependency of terms in documents. In this case, dependency of songs or artists will be illustrated in the graph. So, a song or artist recommendation system will be realized using this graph.

XDES : An eXtensible Data Entry System

 right, around

XDES is a heterogeneous eXtensible Data Entry System for XML data. Operational flexibility for adding, deleting, and modifying data schema is implemented by separating CGI programs from the data schema, which is described as "data macro". Any combination of data items are allowed for realizing faceted classification and are described as "content macro" files from which data entry web pages are created automatically. XDES is being used in Kyushu University since 2003 as a data entry system for 2,300 researchers, who are required to keep filling out 753 kinds of forms for the university evaluation. ( http://hyoka.ofc.kyushu-u.ac.jp/search/ )

demo system

A trial for ID Federation

 right, around

We try to implement a prototype of ID-Federation system using XDES as the backend DB system.

  • User Agent attempts to access some secured resource at the Service Provider.
  • At the Service Provider, Access Controller performs a security check. If a valid security context at the Service Provider doesn't exist, the Access Controller redirects the client to the IdP Discovery Service.
  • The client informs the Service Provider of the location of an endpoint at an Identity Provider through the IdP Discovery Service.
  • IdP Discovery Service redirects the client to the SSO/SLO Service at the Identity Provider.
  • The user is identified by the SSO/SLO Service by some means.
  • The Identity Provider creates a "security pass" and transfer it to the Service Provider. Based on the "security pass" identifying the user, the Service Provider returns the resource.