Crowdsourced Data Analytics: A Machine-Human Solution for Big Data Analytics
报 告 人:范举 副教授
主 持 人:崔来中
日 期:2016年4月22日
时 间:下午14:30
地 点:计软学院623会议室
BIOGRAPHY
Ju FAN is an associate professor at Renmin University of China. His research interests are in general area of data management, with emphasis on the topics including crowdsourcing data analytics, data integration and big data. He received his Ph.D. from Tsinghua University in 2012, and worked as a research fellow at School of Computing, National University of Singapore from 2012 to 2015.
ABSTRACT
Crowdsourcing outsources tasks for solutions from an unknow n group of people (aka workers), which is indeed useful to many machine-difficult problems, such as image recognition, sentimental analysis, entity resolution, etc. In this talk, I would like to give an overview of my works on crowdsourced data analytics that integrates human intelligence with machine algorithms for big data analytics, including the following aspects: (1) Quality control: Due to its openness, crowdsourcing yields relatively low-quality results, or even noise, if there is no proper quality control. I will introduce an adaptive crowdsourcing quality control approach that considers workers' diverse accuracies across different tasks and judiciously assigns tasks to the workers who are well acquainted with the tasks. (2) Cost-based crowdsourcing query optimization: A given crowdsourcing query may have many execution plans to be evaluated in a crowdsourcing marketplace, and the difference in crowdsourcing cost between the best and the worst plans may be several orders of magnitude. I will present a cost-based query optimization approach that provides a good balance between the monetary cost and latency in crowdsourcing. (3) Hybrid machine-crowdsourcing data analytics: To make data analytics effective and scalable, I will present a hybrid approach that integrates machine algorithms with crowdsourcing assist only when necessary, i.e., under user-specific budgets. In particular, I will also introduce its applications on web data integration and healthcare data analytics.