Monday 31 March 2014

GSoC 2014 Blog Post - Incremental mark and copy garbage collector for PyPy


Project Proposal Information:
Proposal Title: Incremental mark and copy garbage collector for PyPy
Proposal Abstract:
Python is an important platform for memory-constrained mobile devices such as mobile phones and PDAs. PyPy is a fast python interpreter written in RPython. However the current GCs still suffer high space overhead .This project proposed incremental mark and copy garbage collector to reduce the current GCs  memory consumption and make a new garbage collector suitable for mobile devices.

Motivation:
Smart phones and other mobile devices are increasingly popular these days.  A fairly large number of new mobile applications launched everyday. The main constraint for this kind of devices is the limited memory and power.  And a considerable fraction of total power is consumed by the memory subsystem.
Thus how to minimize memory consumption during application's execution has become an extremely heated topic.

Python is a widely used dynamic programming language and it is widely used in different areas. Compared to static type languages such as C, C++, it provides a strong and flexible computation model to enable developers to quickly produce applications for different needs. However, the average execution speed and memory consumption of Python Interpreters is also significant worse than other static type language due to lack of optimization from the compilation stage. To improve this problem, Python developers proposed other implementations of Python interpreters. There are two important alternative Python interpreters which aimed to provide better performance for Python language: PyPy and Cpython.  Pypy is known for its superior performance compared to other Python interpreter implementation. It make use of JIT compilation to improve its runtime speed and various garbage collections algorithms to reduce its memory consumption. The PyPy interpreter is implemented in RPython which is a statically typed subset of Python. This project will focus on studying and improving the garbage collection mechanism in PyPy.

PyPy and Cpython adopted very different memory management strategies. To reduce the runtime memory consumption, CPython uses reference counting. The main disadvantage is that it cannot collect unreachable, cyclic data structures and cyclic data structures are quite plausible when many data structures points to their parents and points to each other as cross references. PyPy uses traced-based collector which is run periodically to find unreachable objects and reclaim memory space. We will list some main GC algorithms below with their implementation in PyPy:

Mark-sweep collectors can provide good throughput and pause time but it may cause fragmentation and high space overhead. The current garbage collector in PyPy used the mark-sweep algorithms are minimark and  incminimark. minimark.py is generational stop-the-world GC.  incminimark.py is incremental and used tri-color marking.

Mark-compact collectors can prevent fragmentation. As the result of this, it has low space overhead. It also keeps the order of object allocation thus it can yield better locality.  The main problem of mark-compact is that its compaction phrase is non-incremental. Its current implementation was removed due to lack of maintenance as in the commit 4bf9cf2f18a1 message.

Purely copying collectors areused  in the Semispace copying collector (rpython/memory/gc/semispace.py). It requires minimum space twice the size of live data of a program.
Generational collectors are used in the(rpython/memory/gc/generation.py)

The GCs in Pypy tend to have shorter pause time than Cpython. However, they have higher memory usage compared to CPython[1]. As a result, we believed that we propose alternative garbage collection mechanism for better performance and lower the overall memory consumption.

Project goals:
This goal of this proposal is to add a high performance garbage collector to PyPy to solve the high memory usage issue as well as gains benefits of shorter pauses .

Proposed Mark Copy Algorithm:
This project proposes the new mark copy algorithm[2] that can be used to implement an additional GC for memory constrained environments.

As in the generational algorithm, the new mark copy algorithm also divides the heap into two regions, a nursery and an old generation.
All allocations are performed in the nursery and when the nursery fills up it will copy the survivors(remains reachable objects) into the old generation.

The old generation is further divided into a number of equal size subregions called windows. The windows are numbered from 1 to n (indirect addressing). And lower number windows are collected before higher numbered windows. Each window size
is equal to the size of nursery. The windows themselves don’t need to be contiguous but the space in each window should be contiguous.

MC performs a full heap collection in mark phases and copy phase. And the two phases both are incremental. The mark phase there is a full collection marks live object in the following copy phase the live data is copied and compacted.

During the mark phase, MC traverses live objects in the heap and records the total live objects’ volume in each window. It will also constructs the remembered sets for each window. The marking is triggered when the occupancy of old generation exceeds a predefined threshold(80%). During the collection each window will be assigned a bitmap to record information(already marked, volume of live objects).
The marking phase is triggered earlier than wait until the old generation is full, so the marking work is triggered, nursery allocation can resume. The same write barrier used in incminimark.py will be used two to solve the problem with incremental marking.

During the following copy phase: 1)Grouping the windows. At the end of the mark phase, MC already knows the total volume of marked objects in each window. Then if the percentage of each window that needs to be copied reach one threshold, the window will be marked as high occupancy. High occupancy windows won't be copied and their remember sets will also be discarded since to pointers points to them need to be updated.
2) Copying data. In the copying data phase, the high occupancy window will be placed in the end of the collected list(It works well for large long live objects). The copy  is broken into a number of passes. Each pass live objects of  several windows from old generation will be copied and compacted.

Implementation:
The new mark-copy algorithm will be implemented in RPython and will be added into the module of current GCs(under rpython/memory/gc directory).

As the new GC algorithm shares some steps with current GC, the incremental marking phase and the nursery/old generation division, it can borrow the debug and test methods from current GC.

In the last step, the new GC will be evaluated using the performance metrics: overall execution time, space usage, pause time and program locality.

Timeline:
From Today to April 21st:
Read full pypy documentation and play around with code base. Try to solve the issue for _ast module and issues on bug tracker.

April 22nd to May 18th:
Investigate current available implementations: read generation.py, minipark.py
increminipark.py and all corrent GC collectors

Week 1:
Settled the details of the algorithm for implementation. Contact and discuss  with Mentor.

Week 2-4:
Retrieve the old abandoned mark-and -compact garbage collector for memory usage compare with the new mark-copy algorithm.

Week 5-7:
Implement the memory allocation and incremental collection phrase.
Write tests to test all the functionality.

Week 8-10:
Implement the memory allocation and incremental copy phrase.
Write tests to test all the functionality.

Week 11-13:
Combine the two phrase together and test the functionality of them.

Week 14-15 (August 4th-August 17th)Benchmark and investigate improvement.
Run on benchmarks(translating thetopaz ruby interpreter using various versions of PyPy and CPython), comparing memory usage, execution time, and pause times.
and test the benmarks inhttp://speed.pypy.org/

August 18th - Firm “pencils down” date

I can fully commit 40 hours(or more) per week for the project

Patches/Code samples/Pull requests:
My github profile:https://github.com/wenzhuman
Additional information:
About me:
• Proficient in C++, Java and Python
• Proficient in web development technologies including JavaScript, ActionScript, HTML
• Experience with compiler frameworks including LLVM, Clang and Soot
• Experience with Linux and OS X

In the master phase, my research area is in program analysis and I have implemented different kinds of program analysis tools in Clang and Soot. Last year, I have developed a Joos compiler(without using any framework) in group of three.  We implemented the compiler in Java which follows the modern compiler structure including DFA Scanner, LR1 Parser, AST Constructor, Symbol Table Generator, Type Checker, Static Analyzer and Code Generator.

Citation:


[2]http://people.cs.umass.edu/~emery/pubs/04-15.pdf