JVM-by-LP
Verification of Java Bytecode using Transformation and Analysis Tools for Logic Programming
Version 0.3 --- December 2006
Contents:
Introduction
In this web, we introduce our approach (and provide the tools
required) to verify Java Bytecode in the (Constraint) Logic Programming
paradigm --- (C)LP. In order to achieve this goal, we focus
on the techniques of meta-programming, program specialisation and
static analysis that together support the use of CLP tools to analyse
JVML programs (i.e., programs written in the Java Virtual Machine
Language).
A complete description of our approach can be found in [AGHP-PADL07].
The whole verification process is split in three main parts:
- Meta-programming. We use CLP as a language
for representing and manipulating JVML programs. We have implemented
an automatic translator, called Class Reader,
which given a set of .class files {Class 1,...,Class n}
returns an CLP representation of them in
JVMLr (a representative subset of JVML),
i.e., a JVMLr program denoted by
P. Furthermore, we have also implemented in LP an
interpreter Interpreter_JVMLr,
which captures the JVM semantics.
- Partial evaluation. The development of
partial evaluation techniques has allowed the so-called
"interpretative approach" to compilation. We use an existing Partial Evaluator for CLP in order to specialize the
aforementioned interpreter w.r.t. the LP representation P
of {Class 1,...,Class n}. As a result, we obtain
Ip, an LP residual program which can be seen as a
decompiled and translated version of P into LP.
- Verification of Java bytecode. The final
goal is that the JVML program can be analyzed and verified by analysing the
residual program Ip with state-of-the-art analyzers developed for CLP.
The Class Reader for Ciao (Download)
- The Class Reader takes a set of .class files {Class
1,...,Class n} and returns a CLP program which
contains all the information of {Class 1,...,Class n} in
JVMLr .
- JVMLr is a representative subset of the JVML language
which is able to handle its main features for our purposes.
- The Class Reader for Ciao consists of the following modules:
- A set of auxiliary modules in charge of the representation of the
translations (data_reader, descriptors_grammar, flags_parser,
opcodes, factorized_opcodes and
mis_grammar). For instance,
the descriptors_grammar module helps in the parsing of the
field and method descriptors.
- The classfile_reader module which is able to
parse an individual
.class file, allowing the user
specifying different representation options
(factorization of instructions, cons-pool
references resolution, etc). Thus, the classfile_reader could be
used as a general tool for parsing .class files in different contexts.
- The program_loader module which generates the LP program
following the
JVMLr syntax, basically by calling the
classfile_reader for each .class file
with the appropiate options.
- The LP program generated by the Class Reader
contains, on one hand, the bytecode instructions for all methods in
{Class 1,...,Class n}, represented as "bytecode"
facts; and on the other hand, a single fact "program" obtained
by putting together the "class" terms which store all
required information for each class (except the bytecode instructions which
appear separately in the above facts). All such facts
("bytecode" and "program") are structured as specified
by the JVMLr syntax.
The files can be found
here.
The JVMLr Interpreter in Ciao (Download)
Our JVMLr Interpreter expresses the JVM semantics and
it is written in Ciao Prolog. The formal
JVML specification chosen for our work is
Bicolano, which is a superset of JVMLr. Bicolano is
written with the Coq Proof Assistant which allows checking that the
specification is consistent and also proving properties on the
behavior of some programs.
The current files of the interpreter can be downloaded. This code can be
structured in four main parts:
- the definition of the JVMLr with the access
functions to the source programs,
- the representation of the domains as the heap, the
JVMLr types or the machine states,
- the implementation of the dynamic semantics which uses these
domains and
- the interface with the partial evaluator.
Transformation and Analysis in the CiaoPP System
You can use the partial evaluator integrated in CiaoPP to
produce the residual programs and to analyze them later. CiaoPP is the
precompiler of the CIAO Prolog
development environment. It can perform a number of program debugging,
analysis and source-to-source transformation (including partial
evaluation) tasks on (CIAO) Prolog programs. CiaoPP is distributed
under the GNU general public license. At this site, you will
find all the information about CiaoPP (instructions for downloading,
manuals, tutorials, etc.).
Alternatively, you can also use other available partial evaluators
developed for Logic Programs (e.g., the
ECCE system or
MIXTUS) to
generate the residual programs and existing analyzers for LP to
analyze them later (e.g., the BU tools
or the Polymorphic type inference
tool
).
Some Examples
The following examples have been generated using the Class_Reader, the JVMLr Interpreter and the CiaoPP system, as explained above.
- The MIS is a high level description specifying the method we
want to decompile and its arguments values, and it is used to build the initial state of the
interpreter being partial evaluated.
- We show three different residual program versions for each
example:
- In the first one, we have used the
standard homeomorphic embedding relation.
- In the second, we use a new embedding relation enriched
with pe-types, as proposed in [AGP-BC07].
- In the last column, we combine the new embedding
enriched with pe-types with the use of a new
abstraction operator for reducing multivariance
as proposed in [AGP-BC07].
- Note that, although we show the source Java file (.java file)
for each example,
it is not part of the process as in our approach we take as
input the bytecode program (set of .class files). However,
sometimes it turns out to be interesting to compare the
obtained LP program w.r.t. such source file.
References
[AGHP-PADL07]: E. Albert, M. Gomez-Zamalloa, L. Hubert
and G. Puebla. Verification of Java Bytecode using Transformation and
Analysis Tools for Logic Programming.
Ninth International Symposium on
Practical Aspects of Declarative Languages
(PADL 07), Nice, France
January 14-15, 2007.
PDF
[AGP-BC07]: M. Gomez-Zamalloa, E. Albert
and G. Puebla. Effective and Efficient Decompilation of Java Bytecode
to Prolog by Partial Evaluation.
Submitted to Second Workshop on Bytecode Semantics, Verification,
Analysis and Transformation
(Bytecode 07), Braga, Portugal
March 31, 2007.
PDF
Contact
The approach has been proposed and implemented by Elvira Albert, Miguel
Gomez-Zamalloa, Laurent Hubert and
German Puebla.
Please report any bug or comment to
mzamalloa@clip.dia.fi.upm.es.
Last update December 2006 #