FINDING SETS OF HIGH-FREQUENCY QUERIES FOR HIGH-FREQUENCY-QUERY-BASED FILTER FOR SIMILARITY JOIN

ridm@nrct.go.th ระบบคลังข้อมูลงานวิจัยไทย รายการโปรดที่คุณเลือกไว้

FINDING SETS OF HIGH-FREQUENCY QUERIES FOR HIGH-FREQUENCY-QUERY-BASED FILTER FOR SIMILARITY JOIN

หน่วยงาน จุฬาลงกรณ์มหาวิทยาลัย

รายละเอียด

ชื่อเรื่อง	:	FINDING SETS OF HIGH-FREQUENCY QUERIES FOR HIGH-FREQUENCY-QUERY-BASED FILTER FOR SIMILARITY JOIN
นักวิจัย	:	Kamolwan Kunanusont
คำค้น	:	-
หน่วยงาน	:	จุฬาลงกรณ์มหาวิทยาลัย
ผู้ร่วมงาน	:	Chulalongkorn University. Faculty of Science , Jaruloj Chongstitvatana
ปีพิมพ์	:	2557
อ้างอิง	:	http://cuir.car.chula.ac.th/handle/123456789/45852
ที่มา	:	-
ความเชี่ยวชาญ	:	-
ความสัมพันธ์	:	-
ขอบเขตของเนื้อหา	:	-
บทคัดย่อ/คำอธิบาย	:	Similarity search and similarity join are important operations in text databases. Similarity search finds all records which are similar to the given text query while similarity join matches pairs of similar records from two relations. In some situations, some similar queries are repeated over a period of time. These queries are called high-frequency queries. High-frequency-query-based filter is used to facilitate this type of queries. This method uses an index structure called similarity table to prune non-related text records in relations. A similarity table is created based on a chosen high-frequency query obtained from the query set. However, the performance of this filter method depends mostly on these chosen queries. This thesis proposes a method to find high-frequency queries for the high-frequency-query-based filter. The proposed method is based on a density-based cluster analysis, called DBSCAN, to capture the main characteristics of the query set by grouping them and find the representative points from each group. Two methods – DBRAN and DBSM - to deal with redundant high-frequency queries are proposed. DBRAN finds clusters high-frequency queries, by DBSCAN, and randomly chooses one high-frequency query from a cluster as a representative. DBSM also uses DBSCAN to finds clusters, and repeatedly merge the queries in these clusters until it cannot give any improvement on similarity tables. For evaluation, the proposed method is applied on various sets of queries to find high-frequency queries for three datasets. It is found that DBSM performs better than DBRAN when the similarity between high-frequency queries is low. However, when the similarity between high-frequencies is high, the performance of both DBRAN and DBSM are about the same. Thesis (M.Sc.)--Chulalongkorn University, 2014
บรรณานุกรม	:	APA Chicago MLA Vancouver Kamolwan Kunanusont . (2557). FINDING SETS OF HIGH-FREQUENCY QUERIES FOR HIGH-FREQUENCY-QUERY-BASED FILTER FOR SIMILARITY JOIN. กรุงเทพมหานคร : จุฬาลงกรณ์มหาวิทยาลัย. Kamolwan Kunanusont . 2557. "FINDING SETS OF HIGH-FREQUENCY QUERIES FOR HIGH-FREQUENCY-QUERY-BASED FILTER FOR SIMILARITY JOIN". กรุงเทพมหานคร : จุฬาลงกรณ์มหาวิทยาลัย. Kamolwan Kunanusont . "FINDING SETS OF HIGH-FREQUENCY QUERIES FOR HIGH-FREQUENCY-QUERY-BASED FILTER FOR SIMILARITY JOIN." กรุงเทพมหานคร : จุฬาลงกรณ์มหาวิทยาลัย, 2557. Print. Kamolwan Kunanusont . FINDING SETS OF HIGH-FREQUENCY QUERIES FOR HIGH-FREQUENCY-QUERY-BASED FILTER FOR SIMILARITY JOIN. กรุงเทพมหานคร : จุฬาลงกรณ์มหาวิทยาลัย; 2557.