WALA 101




Julian Dolby

IBM Thomas J. Watson Research Center




Outline


  • Introduction
  • WALA JavaScript
  • Running example
  • WALA JavaScript IR
  • Call graph construction
  • Taint analysis


WALA Everywhere

WALA Everywhere

World of WALA

WALA

  • Open source program analysis toolbox
  • Academic community supported code base
    • started at Thomas J. Watson Research Center
    • selected contributors (of many):
      • Karlsruher Institut für Technologie
      • Technischen Universität Darmstadt
      • 한국과학기술원(KAIST) 전산학부
      • University of California, Berkeley
      • Univeristy of Wisconsin, Madison

WALA

  • Over 100 publications use WALA, e.g.

    • S. Huang, J. Huang, Speeding Up Maximal Causality Reduction with Static Dependency Analysis, ECOOP 17

    • E. Wittern et al., Statically checking web API requests in JavaScript, ICSE 2017

    • M. Rapoport et al., Who you gonna call? Analyzing Web Requests in Android Applications, MSR 2017

    • S. Lee et al., HybriDroid: static analysis framework for Android hybrid applications, ASE 2016

    • W. Huang et al., Scalable and precise taint analysis for Android, ISSTA 2015

    • Y. Ko et al., Practically Tunable Static Analysis Framework for Large-Scale JavaScript Applications, ASE 2015

WALA

  • Code base used in IBM products
    • AppScan
    • API Harmony
    • Watson Conversations (experimental)
  • Multiple uses within AppScan
    • JavaScript analysis
      • Web scanning: AppScan Standard
      • Source scanning: AppScan Source
    • Java and .NET
      • Framework models: F4F

Selected History

2004 ASTk J2EE optimization
2005 ITM Tivoli JavaScript to DB queries
2006   Open source release
2008 GBS ABAP analysis + tooling
2010 AppScan JSA security analysis
2012 ECOOP Correlation tracking
2013 ICSE Approximate call graphs
2013   Android analysis support
2014 AppScan Approximate call graphs in AppScan
2016 ICSE+API Harmony Web API bug detection
2017 WCS Dialog bug finding analysis

WALA JavaScript



  1. Third party parser translates JavaScript to an AST
  2. WALA translates that AST to its Common AST (CAst)
  3. WALA translates CAst to the WALA IR
  4. Analyses are performed on the WALA IR


WALA Running Example

var document = { URL: "whatever",
  write: function Document_prototype_write(x) { } };
var id = function _id(x) { return x; };
function Id() { this.id = id; }
function SubId() { }; SubId.prototype = new Id();

if (Math.random.call(null) > 0) {
    var id1 = new Id();
    var text = id1.id.call(document, document.URL);
} else {
    var id2 = new SubId();
    var text = id2.id("not a url");
}
document.write(text);

WALA Common AST (CAst)

  • Abstract Syntax Tree (AST) specialized for WALA
    • cross-language AST
    • first part of translation to IR
    • WALA supports rewrites at the CAst level
    • hide details of third-party front ends
    • make language details explicit
  • CAst for JavaScript
    • lift local variable declarations
    • expand shorthands like object literals
    • explicit handling of “method” calls

WALA Running - CAst

CAst Example

BLOCK
  BLOCK
    ...
    FUNCTION_STMT
      "CAst: Id"
    ...
    DECL_STMT
      "text"
      VAR "undefined"
  BLOCK
    ASSIGN
      VAR "document"
      OBJECT_LITERAL
        CALL
          VAR "Object"
          "ctor"
        "URL"
        "whatever"
        "write"
        FUNCTION_EXPR
          "CAst: ..._write"
  ...
  IF_STMT
    BINARY_EXPR
      ">"
      SCOPE
        ...
          BLOCK_EXPR
            ASSIGN
              VAR "$$destructure$rcvr1"
              OBJECT_REF
                VAR "Math"
                "random"
            ASSIGN
              VAR "$$destructure$elt1"
              "call"
          CALL
            VAR "$$destructure$elt1"
            "dispatch"
            VAR "$$destructure$rcvr1"
            CONSTANT
      "0.0"
   ...

CAst translation to IR

  • Create WALA IR from CAst
    • cross-language core functionality
    • construct Control Flow Graph (CFG)
    • convert to simple SSAInstructions
    • convert to Static Single Assignment (SSA) form
  • JavaScript-specific extension
    • same extension handles both JavaScript parsers
    • translate JavaScript idiosyncracies
      • e.g. weird “method” calls
      • e.g. non-existent variables are globals

Background: SSA


  • Static Single Assignment (SSA) form
    • every variable has one definition
    • “𝜙 nodes” used where definitions merge
    • variables become values
  • Heart of WALA IR
    • SSA form is the only form used in analysis
    • eliminates “reaching definitions” analysis
    • simplifies expressing dataflow analyses


Background: SSA

if (Math.random.call(null) > 0) {
    var id1 = new Id();
    var text = id1.id.call(document, document.URL);
} else {
    var id2 = new Id();
    var text = id2.id("not a url");
}
document.write(text);
if (Math.random.call(null) > 0) {
    var id1 = new Id();
    var text1 = id1.id.call(document, document.URL);
} else {
    var id2 = new Id();
    var text2 = id2.id("not a url");
}
text3 = 𝜙(text1, text2);
document.write(text3);
  • Illustrative only: WALA does SSA on the IR

WALA IR Example

BB1
0   v2 = new <JavaScriptLoader,LArray>@0     [1:4]->[14:19] [2=[arguments]]
...
4   lexical:id@Ldemo2.js = v6                [1:4]->[14:19]
...
8   v16 = global:global Function             [1:4]->[14:19]
9   v13 = construct v16@9 v15:#Ldemo2.js/SubId exception:v14[1:4]->[14:19]
10   global:global SubId = v13               [1:4]->[14:19]
...
19   v27 = global:global Object              [1:22]->[2:41]
...
21   v25 = construct v27@21 exception:v28    [1:22]->[2:41] [25=[document]]
BB3
22   fieldref v25.v29:#URL = v30:#whatever   [1:22]->[2:41] [25=[document]]
...
28   v36 = construct v39@28 v38:#Ldemo2.js/_id exception:v37[3:18]->[3:20]
29   lexical:id@Ldemo2.js = v36              [3:4]->[3:5]
...
35   fieldref v44.v45:#prototype = v41       [5:38]->[5:38]

WALA IR Example

38   v51 = global:global Math                [7:4]->[7:7]
...
41   v5 = prototype_values(v51)              [7:4]->[7:14]
42   v49 = getfield ...random... v5          [7:4]->[7:14]
BB11
45   v55 = dispatch v54:#call@45 v49,v40:#null exception:v56[7:4]->[7:25]
BB12
46   v46 = binaryop(gt) v55 , v57:#0.0       [7:4]->[7:29]
47   conditional branch(eq, to iindex=65) v46,v58:#0[7:0]->[7:1]
...
65   v72 = global:global SubId               [11:18]->[11:22]
...
67   v71 = construct v72@67 exception:v73    [11:18]->[11:22] [71=[id2]]
BB21
73   v76 = dispatch v65:#id@73 v71,v77:#not a url exception:v78[12:15]->[12:33] [76=[text]71=[id2]]
BB23
           v23 = phi  v67,v76
79   v81 = dispatch v31:#write@79 v25,v23 exception:v82[14:0]->[14:19] [25=[document]23=[text]]

WALA Running - IR

IR Features


  1. Object Creation
  2. Lexical scoping
  3. Prototype chain
  4. “Method” calls
  5. SSA


Object Creation

  • new takes an expression, not type
    var x = (...)? Object: Array;
    var y = new x();
    
  • new has diverse semantics

    new Object() fresh object
    new Object(5) the value 5
    new Array(3) array of length 3
    new Array(1,2,3) array containing [1, 2, 3]


Object Creation


  • Model new expressions as special calls
    • interpret new expressions as call expressions
    • type-dependent dispatch
    • argument-count-dependent dispatch
  • JavaScriptConstructorFunctions
    • model constructor behavior based on spec
    • primitive and user “types”
    • generate WALA IR explicitly


Object Creation

  • JavaScriptConstructorFunctions
private IMethod makeUnaryObjectConstructor(IClass cls) {
  JSInstructionFactory insts = (JSInstructionFactory)
    cls.getClassLoader().getInstructionFactory();
  MethodReference ref =
    JavaScriptMethods
      .makeCtorReference(JavaScriptTypes.Object);
  JavaScriptSummary S = new JavaScriptSummary(ref, 2);
  S.addStatement(
    insts.ReturnInstruction(
      S.getNumberOfStatements(), 2, false));
  S.getNextProgramCounter(); 
  return new JavaScriptConstructor(ref, S, cls,
    cha.lookupClass(JavaScriptTypes.Object));
}


User Object Creation


0   v2 = new <JavaScriptLoader,LArray>@0     
BB1
1   v4 = getfield ...prototype... v1
BB2
2   v5 = new <JavaScriptLoader,LObject>@2
BB3
3   set_prototype(v5, v4)          
4   v7 = invoke v1@4 v5 exception:v8         
BB4
5   conditional branch(eq, to iindex=7) v7,v9
BB5
6   return v7                                
BB6
7   return v5

Lexical Scoping

  • Access to enclosing function state
    • read and write support, unlike Java
    • allows “upward funargs”:
      function Obj(x) {
      var state = x;
      this.get = function() { return state; } ;
      };
      var obj = new Obj(function(){return 3;});
      var test1 = ( obj.get() )();
      
  • WALA models as heap locations
    • reads and writes are flow insensitive
    • does not do SSA renaming for them

Lexical Scoping

  • IR of function Id
    • uses ‘id’ value from script demo2.js
    • specified as a name and creator
    • value modeled in function object
BB0
BB1
0   v3 = new <JavaScriptLoader,LArray>@0  [4:9]->[4:10] [3=[arguments]]
1   v5 = lexical:id@Ldemo2.js             [4:26]->[4:27]
2   check v5                              [4:26]->[4:27]
BB2
3   fieldref v2.v6:#id = v5               [4:24]->[4:24] [2=[this]]
BB3

Prototype chain

  • JavaScript uses prototype-based inheritance
    • objects point a ‘prototype’
    • properties can be found in prototype
      function Id() { this.id = id; }
      function SubId() { };
      SubId.prototype = new Id();
      
  • Flow-insensitive model of all prototypes
    • conservative model of inheritance
    • no model for must-override
      if (...) { SubId.prototype = new Id(); }
      

Prototype chain

30   v42 = global:global Id                  [5:44]->[5:45]
...
32   v41 = construct v42@32 exception:v43    [5:44]->[5:45]
...
33   v44 = global:global SubId               [5:22]->[5:26]
...
35   fieldref v44.v45:#prototype = v41       [5:38]->[5:38]
...
65   v72 = global:global SubId               [11:18]->[11:22]
...
67   v71 = construct v72@67 exception:v73    [11:18]->[11:22] [71=[id2]]
BB21
73   v76 = dispatch v65:#id@73 v71,v77:#not a url exception:v78[12:15]->[12:33] [76=[text]71=[id2]]
  1. id assigned as property of Id
  2. an Id prototype of SubId
  3. id called as method on a SubId

"Method" calls


  • method call
     var x = new SubId();
     var y = x.id("not a url");
    
  • this
     x
    
  • function call
     var x = new SubId();
     var f = x.id;
     var y = f("not a url");
    
  • this
     Global object
    


  • WALA represents methods explicitly for precision

"Method" call imprecision


  • method call
    if (...) {
     var x = new Id();
    } else {
     var x = new SubId();
    }
    var y = x.id("not a url");
    
  • function call
    if (...) {
     var x = new Id();
    } else {
     var x = new SubId();
    }
    var f = x.id;
    var y = f.call(x, "not a url");
    


  • function call could have 4 possibilities
    • 2 values for ‘this’
    • 2 values for ‘f’

"Method" call implementation


65   v72 = global:global SubId               [11:18]->[11:22]
...
67   v71 = construct v72@67 exception:v73    [11:18]->[11:22] [71=[id2]]
BB21
73   v76 = dispatch v65:#id@73 v71,v77:#not a url exception:v78[12:15]->[12:33] [76=[text]71=[id2]]


  • dispatch instruction links this and method
    • avoid imprecision of reading, calling function
    • only calls each method with corresponding this

SSA


var id2 = new SubId();
var text = id2.id("not a url");
var x = new SubId();
var id2 = x;
var text = id2.id("not a url");
  • WALA SSA implements copy propagation
    • removes unnecessary assignments
    • replaces uses with right hand side
var x = new SubId();
// var id2 = x;
var text = x.id("not a url");
  • Illustrative only: WALA operates on IR


Call Graph Construction


  • Call graphs denote what functions calls call
    • e.g. the _id function is called here:
      var text = id2.id("not a url");
      
  • Key question is estimating targets
    • languages like Java have types to help
    • JavaScript does not
    • Must track functions from creations to calls


Tracking Functions

...

var id = function _id(x) {
  return x;
};
function Id() {
  this.id = id;
}
function SubId() { };
SubId.prototype = new Id();

...

var id2 = new SubId();
var text = id2.id("not a url");
  1. _id created
  • Track propagation of functions to call sites

Tracking Functions

...

var id = function _id(x) {
  return x;
};
function Id() {
  this.id = id;
}
function SubId() { };
SubId.prototype = new Id();

...

var id2 = new SubId();
var text = id2.id("not a url");
  1. _id created
  2. Id created
  • Track propagation of functions to call sites

Tracking Functions

...

var id = function _id(x) {
  return x;
};
function Id() {
  this.id = id;
}
function SubId() { };
SubId.prototype = new Id();

...

var id2 = new SubId();
var text = id2.id("not a url");
  1. _id created
  2. Id created
  3. SubId created
  • Track propagation of functions to call sites

Tracking Functions

...

var id = function _id(x) {
  return x;
};
function Id() {
  this.id = id;
}
function SubId() { };
SubId.prototype = new Id();

...

var id2 = new SubId();
var text = id2.id("not a url");
  1. _id created
  2. Id created
  3. SubId created
  4. Id constructed
  • Track propagation of functions to call sites

Tracking Functions

...

var id = function _id(x) {
  return x;
};
function Id() {
  this.id = id;
}
function SubId() { };
SubId.prototype = new Id();

...

var id2 = new SubId();
var text = id2.id("not a url");
  1. _id created
  2. Id created
  3. SubId created
  4. Id constructed
  5. id assigned
  • Track propagation of functions to call sites

Tracking Functions

...

var id = function _id(x) {
  return x;
};
function Id() {
  this.id = id;
}
function SubId() { };
SubId.prototype = new Id();

...

var id2 = new SubId();
var text = id2.id("not a url");
  1. _id created
  2. Id created
  3. SubId created
  4. Id constructed
  5. id assigned
  6. prototype set
  • Track propagation of functions to call sites

Tracking Functions

...

var id = function _id(x) {
  return x;
};
function Id() {
  this.id = id;</span>
}
function SubId() { };
SubId.prototype = new Id();

...

var id2 = new SubId();
var text = id2.id("not a url");
  1. _id created
  2. Id created
  3. SubId created
  4. Id constructed
  5. id assigned
  6. prototype set
  7. SubId constructed
  • Track propagation of functions to call sites

Tracking Functions

...

var id = function _id(x) {
  return x;
};
function Id() {
  this.id = id;
}
function SubId() { };
SubId.prototype = new Id();

...

var id2 = new SubId();
var text = id2.id("not a url");
  1. _id created
  2. Id created
  3. SubId created
  4. Id constructed
  5. id assigned
  6. prototype set
  7. SubId constructed
  8. id read
  • Track propagation of functions to call sites

WALA Call Graphs


  • “Propagation-based” call graph construction
    • track data flow in the program
    • language semantics determines calls
  • “Field-based” approximate call graphs
    • approximate data flow for scalability
    • currently used in product

Propagation-Based Call Graphs


JSCFABuilder B =
  JSCallGraphBuilderUtil
    .makeScriptCGBuilder(..., file);
CallGraph CG =
  B.makeCallGraph(B.getOptions());
  • JavaScript traditional call graphs
    • tracks propagation of data through the program
    • flexible control of context sensitivity


"Call" Handling


  • Function.call call on function objects
    • “first class” function call
    • analogous to Method.call in Java
    • common in JavaScript frameworks
      var id1 = new Id();
      var text = id1.id.call(document, document.URL);
      
  • Modeled in WALA with trampoline function
    • natural to handle with data flow
    • this object in trampoline is function to call

Context Sensitivity


  • Separate anaylses of different calls to same function
    • potentially better precision
    • unpredictable impact on cost
      if (Math.random.call(null) > 0) {
      var id1 = new Id();
      var text = id1.id.call(document, document.URL);
      } else {
      var id2 = new SubId();
      var text = id2.id("not a url");
      }
      
    • one or two analyses of _id

Framework Scalability Issues

  • the extend idiom
    var o1 = { a: 1, b: 2 };
    var o2 = { }
    for(var p in o1) {
    var x = o1[p];
    o2[p] = x;
    }
    
  • value estimates by data flow
var values
o1 o1
o2 o2
p “a”, “b”
x 1, 2
  • imprecise estimates for $o2$: $o2.p \subseteq {1, 2}$

Framework Scalability Issues

  • the extend idiom
    var o1 = { a: 1, b: 2 };
    var o2 = { }
    for(var p in o1) {
    var x = o1[p];
    // o2[p] = x;
    }
    
  • value estimates by data flow
var values
o1 o
o2 o
p “a”, “b”
x 1, 2
  • precise estimates for $o$: $o.p = 1$

WALA Running - Call Graph

Taint Analysis Example


  • Analysis to find bad dataflow
    • start from arbitrary “sources”
    • end at arbitrary “sinks”
  • Simplified version of product
    • illustrate WALA data structures
    • illustrate data flow mechanisms

WALA Taint Example

var document = { URL: "whatever",
  write: function Document_prototype_write(x) { } };
var id = function _id(x) { return x; };
function Id() { this.id = id; }
function SubId() { }; SubId.prototype = new Id();

if (Math.random.call(null) > 0) {
    var id1 = new Id();
    var text = id1.id.call(document, document.URL);
} else {
    var id2 = new SubId();
    var text = id2.id("not a url");
}
document.write(text);

System Dependence Graph (SDG)


  • Dependence between program points
    • control, data, or both
    • WALA SDG built from call graph
  • Abstracts (most) language details
    • generic notion of Statement
    • normal notion of graph edges for dependence
  • com.ibm.wala.ipa.slicer
    • SDG
    • Statement

Creating the SDG


final JSCallGraph cg =
 cgBuilder.extract(interpreter, flowGraph, eps, monitor);
...
PointerAnalysis<ObjectVertex> ptrs =
 flowGraph.getPointerAnalysis(cg, cache, monitor);

SDG<ObjectVertex> sdg =
 new SDG<ObjectVertex>(cg, ptrs,
  DataDependenceOptions.NO_BASE_NO_HEAP_NO_EXCEPTIONS,
  ControlDependenceOptions.NONE);

Finding Tainted Sources


public static EndpointFinder<Statement> documentUrlSource =
 (Statement s) -> {
 if (s.getKind()==Kind.NORMAL) {
  NormalStatement ns = (NormalStatement) s;
  SSAInstruction inst = ns.getInstruction();
  if (inst instanceof SSAGetInstruction) {
   if (((SSAGetInstruction)inst)
        .getDeclaredField().getName()
	  .toString().equals("URL")) {
    return true;
 } } }
 return false;
};
    var text = id1.id.call(document, document.URL);
  • Field reads of the URL property

Finding Tainted Sinks


public static EndpointFinder<Statement> documentWriteSink =
 (Statement s) -> {
 if (s.getKind()==Kind.PARAM_CALLEE) {
  String ref = ((ParamCallee)s).getNode()
   .getMethod().toString();
  if (ref.contains("Document_prototype_write")) {
   return true; } }
 return false;
};
document.write(text);
  • Calls to document.write

Finding Tainted Paths


public static <T> Set<List<T>> getPaths(Graph<T> G, EndpointFinder<T> sources, EndpointFinder<T> sinks) {
 Set<List<T>> result = HashSetFactory.make();
 for(T src : G) {
  if (sources.endpoint(src)) {
   for(final T dst : G) {
    if (sinks.endpoint(dst)) {
     BFSPathFinder<T> paths = 
      new BFSPathFinder<T>(G, new NonNullSingletonIterator<T>(src), new Predicate<T>() {
       public boolean test(T t) {
        return t.equals(dst); }});
     List<T> path;
     if ((path = paths.find()) != null) {
      result.add(path); }}}}}
 return result; }
  • Search over SDG for paths from source to sink

WALA Running - Taint Analysis