Data and Knowledge Quality 2010
January 26, 2010
DKQ 2010 proceedings
Proceedings of the workshop are available in pdf format: actes-qdc-2010.pdf
Vers un usage éclairé de la donnée géographique
The access to geographic information has been facilitated by the emergence of web services,
the use of spatial data is increasingly frequent nowadays by users of different backgrounds,
having different objectives and competences.
To use information as well as possible, these consumers of spatial data must be able to estimate
the quality of the available data. An opportunity to assess the quality is to use metadata. The
standard ISO 19115 provides fields to consider the quality of data, but they have been specified
according to the producer view. However, the desired quality differs according to users because
it depends on the need, of the application and end the final use.
In this paper, we present the existing quality criteria defined in ISO 19115, and we propose
ways to take into account the quality according to the application context, the user needs and
the fitness for use.
Propriété des mesures d'intérêt pour l'extraction des règles
Sylvie Guillaume, Dhouha Grissa, Engelbert Mephu Nguifo
Finding interesting association rules is an important and active research field in data mining. The
algorithms of the Apriori family are based on two measures to extract the rules, support and confidence.
Although these two measures have accelerators algorithmic virtues, they generate a prohibitive number
of rules most of which are redundant and irrelevant. It is therefore a need for further measures filtering
uninteresting rules. This article synthesizes different reported works to identify the "good" measures
properties for extraction rules to retain those who are interesting for the user. All these properties are then
formalized and assessed on more than sixty measures.
Mesure de la robustesse de règles d'association
Yannick Le Bras, Patrick Meyer, Philippe Lenca, Stéphane Lallich
In this article we give a formal definition of the robustness of association rules, based on a
model from our previous work. We think that it is a central concept in the evaluation of the rules
and has only been studied unsatisfactorily up to now. It is crucial because we have observed
that a good rule (according to a given quality measure) might turn out as a very fragile rule with
respect to small variations in the data. The robustness measure that we propose here depends on
the selected quality measure, the value taken by the rule and the minimal acceptance threshold
chosen by the user. We present a few properties of this robustness, detail its use in practice
and show the outcomes of various experiments. All in all, we present a new perspective on the
evaluation of association rules.
Une approche basée agrégation pour une meilleure détection d'intrusions
Emna Bahri, Harbi Nouria
Currently, research in data mining focused on the field of intrusions detection in the computing
systems. In effect, the detection of the anomalies in the data-processing networks is
regarded as one problem of data classification where the use of the data mining techniques.
Several approaches of machine learning are used on complex masses of data and dynamic in
order to build an effective intrusion detection system (IDS). However, these approaches are
confronted with problems of precision and especially the increase of false alarms "the falsepositives"
when they use complex and unbalanced data with a high speed traffic networks.
In order to improve detection of these attacks and consequently to decrease the error rate in
classification, we propose in this paper a new approach based on the aggregation of classifiers
having for goal to improve the performance of a classifier by a method of vote. In fact, our
approach is hybrid, since that it classifies at the same time normal connections and attacks,
using a version of boosting adapted to the networks data "screens". This new approach that we
proposed reduce mainly the false-positives.
Nouvelle représentation concise exacte des motifs corrélés fréquents basée sur une exploration simultanée des espaces de recherche conjonctif et disjonctif
Nassima Ben Younes, Tarek Hamrouni, Sadok Ben Yahia
Data mining is an extraction process of manageably-sized knowledge from huge large sets
of data. The mined knowledge can be under the form of association rules which are correlations
between attributes. However, the number of attributes is very high, what induced that the focus
has been mainly on data with high frequency. The mined patterns are hence called frequent
patterns. Unfortunately, such class of patterns does not offer information about the correlation
ratio amongst the items that constitute a given pattern. In this respect, we propose in this paper
a new concise representation of frequent correlated patterns, while defining the corresponding
closure operator associated to the correlation measure bond. This reduced set makes it possible
not only to derive all the frequent correlated patterns without information loss but also to derive
the conjunctive, disjunctive and negative supports of each pattern in an efficient way.
Webmaster : Jérôme Azé (Bioinformatique Team - LRI)