Friday, April 15, 2022

Exploring a Webpack stats file with Prolog

A couple of days ago I was reading about the Webpack statistics file created using the following command line options:

npx webpack --profile --json

This file contains a lot of information collected by Webpack about the project being processed. The information in this file is used by nice visualization tools like Webpack Bundle Analyzer.

The dependency graph is included in this file. That is, all the dependencies between modules of the project. Being able to perform queries on this data could be useful to get insights into the code.

There are many tools to process JSON, but I wanted to try to use SWI-Prolog to see if I can get information from this file.

The information I am looking for is the module dependency information. By taking a look at the Module Object we can get this information using the reasons property.

We can start by parsing the stats.json file using SWI-Prolog builtin library for reading JSON:

:- use_module(library(http/json)).

read_json_file(FileName, Terms) :-
    open(FileName, read, Stream),
    json_read(Stream, Terms),

For convenience, I'm adding the loaded file to the Prolog database using assert/1:

?- read_json_file('c:\\smallexample\\stats.json',F), assert(testfile(F)).
F = json([hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json(...), ... = ...|...]).

Now that we can load the stats file we can start by performing simple queries. For example we can start by looking at top-level properties:

?- testfile(json(Contents)), member(Name=_,Contents).
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
Name = hash ;
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
Name = version ;
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
Name = time ;
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
Name = builtAt ;
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
Name = publicPath ;
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],

Here notice, that I'm using member/2 to get the name of the properties in the main file.

By the way, as a side note, yesterday I learned that you can exclude variables from Prolog results using the following (Stack overflow question here) goal:

set_prolog_flag(toplevel_print_anon, false).

With this nice tip, we can exclude variables that start with underscore from the results:

?- testfile(json(_Contents)), member(Name=_, _Contents).
Name = hash ;
Name = version ;
Name = time ;
Name = builtAt ;
Name = publicPath ;
Name = outputPath ;
Name = assetsByChunkName ;
Name = assets ;
Name = chunks ;
Name = modules ;
Name = entrypoints ;
Name = namedChunkGroups ;
Name = errors ;
Name = errorsCount ;
Name = warnings ;
Name = warningsCount ;
Name = children.

Now I can access the modules section to extract the reasons property. This property has information on modules that depend on the current module. For example say that we have a small TypeScript program that have the following structure:


We can start the exploration of this project by looking at the contents of the modules objects.

?- testfile(json(_Contents)),
|    member( ('modules'=_Modules), _Contents),
|    member( json(_ModulePropsList), _Modules),
|    member( ('name'=ModuleName), _ModulePropsList).
ModuleName = './src/index.ts' ;
ModuleName = './src/parser.ts' ;
ModuleName = './src/FuncApply.ts' ;
ModuleName = './src/NumLiteral.ts' ;
ModuleName = './src/SymbolObj.ts' ;
ModuleName = './src/BaseObject.ts' ;
ModuleName = 'webpack/runtime/define property getters' ;
ModuleName = 'webpack/runtime/hasOwnProperty shorthand' ;
ModuleName = 'webpack/runtime/make namespace object' ;

We can create a new goal with the code above which we can use later:

module_name(json(ContentsList), Name) :-
    member(('modules'=Modules), ContentsList),
    member(json(ModulePropertiesList), Modules),
    member('name'=Name, ModulePropertiesList).

module_properties_by_name(json(ContentsList), Name, ModulePropertiesList) :-
    member(('modules'=Modules), ContentsList),
    member(json(ModulePropertiesList), Modules),
    member('name'=Name, ModulePropertiesList).

Now that we located the modules, we can get the contents of the reasons property.

?- testfile(_Json),
|    module_properties_by_name(_Json, './src/BaseObject.ts', _Props),
|    member((reasons=_Reasons), _Props),
|    member(json([_|[RefModName|_]]), _Reasons).
RefModName =  (module='./src/FuncApply.ts') ;
RefModName =  (module='./src/FuncApply.ts') ;
RefModName =  (module='./src/NumLiteral.ts') ;
RefModName =  (module='./src/NumLiteral.ts') ;
RefModName =  (module='./src/SymbolObj.ts') ;
RefModName =  (module='./src/SymbolObj.ts') ;

(Repeated results seem to indicate different "reasons")

With this data we can generate Graphviz representation (for example the one used in the graph above).

name_modules([], [], _).
name_modules([ModName|Rest], [ModNamePair|RestResult], Counter) :-
    number_string(Counter, CounterStr),
    string_concat('M', CounterStr, ModuleId),
    ModNamePair = ModName - ModuleId,
    NewCounter is Counter + 1,
    name_modules(Rest, RestResult, NewCounter).

module_dependencies_by_reason(File, (Name-Referencer)) :-
    module_name(File, Name),
    module_properties_by_name(File, Name,Props),
    member((reasons=R), Props),

generate_node_descriptions([],Result, Result).
generate_node_descriptions([(Name-Id)|Rest],TmpResult, OutStr) :-
    format(atom(Tmp3), '~a[label="~a"];\n', [Id, Name]),
    string_concat(TmpResult, Tmp3, OutStrTmp),
    generate_node_descriptions(Rest, OutStrTmp, OutStr).

generate_node_relations([], _, Result, Result).
generate_node_relations([(Target-Src)|Rest], NodeIds, TmpResult, Result) :-
    get_assoc(Src, NodeIds , SrcCode),
    get_assoc(Target, NodeIds , TargetCode),
    format(atom(RelationStr), '~a -> ~a;\n', [SrcCode, TargetCode]),
    string_concat(TmpResult, RelationStr, NewTmpResult),
    generate_node_relations(Rest, NodeIds, NewTmpResult, Result), !.
generate_node_relations([_|Rest], NodeIds, TmpResult, Result) :-
    generate_node_relations(Rest, NodeIds, TmpResult, Result),!.

dot_file_from_reasons(File, DotFileStr) :-
    findall(Name, module_name(File, Name), NameList),
    name_modules(NameList, CodedList, 0),
    list_to_assoc(CodedList, AssocNameModList),!,
    setof(Pairs, module_dependencies_by_reason(File, Pairs), PairList),
    generate_node_descriptions(CodedList, 'digraph G {\n', DotFileStrTmp1),
    generate_node_relations(PairList, AssocNameModList, DotFileStrTmp1, DotFileStrTmp2),
    string_concat(DotFileStrTmp2, '}', DotFileStr).

I am impressed by the power of Prolog. I have always admired the way it works differently dependending on how you use it. For example the way member/2 was used above to extract internal elements from terms. One would assume that this predicate is only used to test list membership. However by the power of Prolog unification and backtracking we can used to explore the contents of a list.