Abstract
W383 contains huge amount of data both in structured and unstructured form. One of the forms for the structured data is HTML tables for which metadata are not explicitly stored/available. As a result, data in such tables cannot be queried accurately and users cannot get exact results to their queries through search engines. Schema extraction establishes schema for data found in different form of web tables. Once the schema of web tables has been extracted, the tables can be created and populated against this schema which can be queried using SQL resulting much better results than traditional search engines. Schema Matching determines number of correspondences which identifies the similar elements from two different schemas. Columns and data values are compared one after the other to match schema. In this paper, different ways of extracting data from tables are mentioned and different tools used for schema extraction are named. Two techniques for Schema Matching are briefly explained.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of 2018 2nd IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference, IMCEC 2018 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 1290-1294 |
Number of pages | 5 |
ISBN (Print) | 9781538618035 |
DOIs | |
State | Published - Sep 20 2018 |
Externally published | Yes |