26692 |
Creator |
8f39a563a9cf1106f50ca74735fa31ed |
26692 |
Creator |
ext-62d553d2cd6fbcccbdab1af03dc419f2 |
26692 |
Date |
2010 |
26692 |
Is Part Of |
repository |
26692 |
Is Part Of |
p15708268 |
26692 |
abstract |
As web users disseminate more of their personal information on the web, the possibility
of these users becoming victims of lateral surveillance and identity theft increases.
Therefore web resources containing this personal information, which we refer to as
identity web references must be found and disambiguated to produce a unary set of
web resources which refer to a given person. Such is the scale of the web that forcing
web users to monitor their identity web references is not feasible, therefore automated
approaches are required. However, automated approaches require background knowledge
about the person whose identity web references are to be disambiguated. Within this
paper we present a detailed approach to monitor the web presence of a given individual
by obtaining background knowledge from Web 2.0 platforms to support automated disambiguation
processes. We present a methodology for generating this background knowledge by exporting
data from multiple Web 2.0 platforms as RDF data models and combining these models
together for use as seed data. We present two disambiguation techniques; the first
using a semi-supervised machine learning technique known as Self-training and the
second using a graph-based technique known as Random Walks, we explain how the semantics
of data supports the intrinsic functionalities of these techniques. We compare the
performance of our presented disambiguation techniques against several baseline measures
including human processing of the same data. We achieve an average precision level
of 0.935 for Self-training and an average f-measure level of 0.705 for Random Walks
in both cases outperforming several baselines measures. |
26692 |
authorList |
authors |
26692 |
issue |
2-3 |
26692 |
status |
peerReviewed |
26692 |
volume |
8 |
26692 |
type |
AcademicArticle |
26692 |
type |
Article |
26692 |
label |
Rowe, Matthew and Ciravegna, Fabio (2010). Disambiguating identity web references
using Web 2.0 data and semantics. Journal of Web Semantics: Science, Services and
Agents on the World Wide Web, 8(2-3) pp. 125–142. |
26692 |
label |
Rowe, Matthew and Ciravegna, Fabio (2010). Disambiguating identity web references
using Web 2.0 data and semantics. Journal of Web Semantics: Science, Services and
Agents on the World Wide Web, 8(2-3) pp. 125–142. |
26692 |
Title |
Disambiguating identity web references using Web 2.0 data and semantics |
26692 |
in dataset |
oro |