Monday, 11 August 2014

only zero gc pointers not the whole nursery

As mentioned in the earlier post, our original idea is divided nursery into two parts that growing in opposite directions.

However, after I implemented this idea and adopted GC I started to think that maybe it's not the best solution. As instead of simplifies the nursery, we made it more complicated.
As the problem is following the uninitialized Gc Pointers with crash program, we can just zero the fields whose type is GcPtr. 

So I proposed this solution and after discussion with my mentors, we chose this solution instead of the two-end nursery.

After coding one month and a half,  here is my progress.
1. The GC is fully adopted and tested that new object are all allocated  in uninitialized memory.
This simplified the c code that is generated and there is only one end of nursery(used two have two ends-top/real top).
 Here is the generated c code:
v1(old):

v2(new):


As shown above, the new generated c code gets rid of the expensive OP_RAW_MEMCLEAR
and the c code is simplified.
 
2.As now the malloc is non_clear, all the clear operation and methods and fully refactored e.g, malloc_fixed_size_clear to malloc_fixed_size

3.Rewrite the logic in gcwrapper of how to choose the right malloc function between different gc.

4. For GcStruct and GcArray, we insert an special operation after malloc-  zero_gc_pointers_inside,now
 this operation is implemented but doesn't fully work(tons of debugging:(  )

Friday, 13 June 2014

two end nursery GC

So things have been kind of complicated in the past four weeks.
And my original project changed to a new one as someone in pypy have been working on  the original project(add pinning to GC) for months, but none of my mentors know until two weeks passed.

So my new project is :
two end nursery GC.
As mentioned in previous post, current incremen
tal minimark GC in pypy has two generations: nursery generation and old generation. There is also a external_malloc() to raw-allocate objects(these objects are out of GC control)

The current nursery allocation logic is clear: there is three pointers :nursery-free/nursery-top/nursery-real-top, between nursery-free and nursery-top there is a piece of zeroed memory (grey area below)
,between nursery-top and nursery-real-top  there is non-zeroed( contains random garbage) memory( blue area below).


The nursery is cleaned up step by step, when the nursery-free reaches nursery-top, the nursery-top will move forward and zero the memory for one more step(defined by nursery-cleanup).

From this mechanism  we can see every object in allocation in memory that is allocated in memory full of zeros.
This is very useful for some objects.  Take an object that contains a field which is a GC pointer.  It needs to be zero-initialized, because otherwise, if the program doesn't quickly fill the pointer, garbage is left in this field.  At the next minor or major collection, trying to follow this garbage-valued pointer would
crash.
However, not all objects need such zero-allocation.  For any object not containing any GC pointers, it is not useful from the point of view of the GC.   The point is that we would save part of the time needed to zero out the nursery if we could choose to allocate zeroed or non-zeroed objects.

So the idea of the project is to allocation objects(don't have GC pointers) from nursery-real-top and grows towards nursery-top. So the nursery can allocate objects from two end.

These will save the time that instead of there are two writes(zeroing and allocation) for every object there will be just one write for some of the objects.

The current allocation function are malloc_fixedsize_clear(), malloc_varsize_clear(), I added two new pointers nursery-second-part-free(grows from nursery-real-top), nursery-second-part-end(points to nursery-top).
I also implemented malloc_fixedsize() as the third allocation function in GC to allocate objects in opposite direction from the other end of nursery.

 The next step will be modify the GCtransformer to replace malloc_fixedsize_clear() with malloc_fixedsize() when appropriate.







Sunday, 25 May 2014

Introduce pinned objects to Pypy

Before the GSOC started, as my proposal plan is kind of huge for 3 month work, so my mentor talked with me about the idea to introduce pinned objects in pypy.

It's the first time I ever heard about the concept so I did some research in this topic(especially in different GC in other languages) and read the current code base before implementation.

I think it's good to wrap up what I have found so far:

1. what's is pinned object?

a pinned object is an object that is not allowed to move.
Sometime GC will compact the memory to create large chunk of free space, so if an object is pinned, it means it tells GC don't move this object.
It's only useful when working with pointers

2.why should pypy add pinned objects?
It's out of performance concern.  For example, without pinning, pypy does a copy each time there is a os.write/read

3.as far as I know every cffi object should be already pinned, what's the implementation of non-movable object in pypy now?

cffi object is now raw malloced. raw malloced object is out of GC control.
so there is a copy version of object that is raw malloced(the so called non-movable object).

the goal is to created a GC manageable object at the creation time instead copying raw-malloced object around.

Things are getting more clear :)  :
Pypy is using a generational GC that heap is divided into two parts -young nursery and old generation.
After minor collection objects in young nursery are all moved in to old generation. But if the object is pinned, we won't move the object and the object will stay in the young generation.

Problems need to be solved:
fragmentation? What if there are too many pinned object in young generation that cause the young generation too fragmented?


As pypy is a test-driven project, I am writing the test cases in the first week and introduce a GC manageable object involves a lot of different layers of pypy so I need.






Monday, 31 March 2014

GSoC 2014 Blog Post - Incremental mark and copy garbage collector for PyPy


Project Proposal Information:
Proposal Title: Incremental mark and copy garbage collector for PyPy
Proposal Abstract:
Python is an important platform for memory-constrained mobile devices such as mobile phones and PDAs. PyPy is a fast python interpreter written in RPython. However the current GCs still suffer high space overhead .This project proposed incremental mark and copy garbage collector to reduce the current GCs  memory consumption and make a new garbage collector suitable for mobile devices.

Motivation:
Smart phones and other mobile devices are increasingly popular these days.  A fairly large number of new mobile applications launched everyday. The main constraint for this kind of devices is the limited memory and power.  And a considerable fraction of total power is consumed by the memory subsystem.
Thus how to minimize memory consumption during application's execution has become an extremely heated topic.

Python is a widely used dynamic programming language and it is widely used in different areas. Compared to static type languages such as C, C++, it provides a strong and flexible computation model to enable developers to quickly produce applications for different needs. However, the average execution speed and memory consumption of Python Interpreters is also significant worse than other static type language due to lack of optimization from the compilation stage. To improve this problem, Python developers proposed other implementations of Python interpreters. There are two important alternative Python interpreters which aimed to provide better performance for Python language: PyPy and Cpython.  Pypy is known for its superior performance compared to other Python interpreter implementation. It make use of JIT compilation to improve its runtime speed and various garbage collections algorithms to reduce its memory consumption. The PyPy interpreter is implemented in RPython which is a statically typed subset of Python. This project will focus on studying and improving the garbage collection mechanism in PyPy.

PyPy and Cpython adopted very different memory management strategies. To reduce the runtime memory consumption, CPython uses reference counting. The main disadvantage is that it cannot collect unreachable, cyclic data structures and cyclic data structures are quite plausible when many data structures points to their parents and points to each other as cross references. PyPy uses traced-based collector which is run periodically to find unreachable objects and reclaim memory space. We will list some main GC algorithms below with their implementation in PyPy:

Mark-sweep collectors can provide good throughput and pause time but it may cause fragmentation and high space overhead. The current garbage collector in PyPy used the mark-sweep algorithms are minimark and  incminimark. minimark.py is generational stop-the-world GC.  incminimark.py is incremental and used tri-color marking.

Mark-compact collectors can prevent fragmentation. As the result of this, it has low space overhead. It also keeps the order of object allocation thus it can yield better locality.  The main problem of mark-compact is that its compaction phrase is non-incremental. Its current implementation was removed due to lack of maintenance as in the commit 4bf9cf2f18a1 message.

Purely copying collectors areused  in the Semispace copying collector (rpython/memory/gc/semispace.py). It requires minimum space twice the size of live data of a program.
Generational collectors are used in the(rpython/memory/gc/generation.py)

The GCs in Pypy tend to have shorter pause time than Cpython. However, they have higher memory usage compared to CPython[1]. As a result, we believed that we propose alternative garbage collection mechanism for better performance and lower the overall memory consumption.

Project goals:
This goal of this proposal is to add a high performance garbage collector to PyPy to solve the high memory usage issue as well as gains benefits of shorter pauses .

Proposed Mark Copy Algorithm:
This project proposes the new mark copy algorithm[2] that can be used to implement an additional GC for memory constrained environments.

As in the generational algorithm, the new mark copy algorithm also divides the heap into two regions, a nursery and an old generation.
All allocations are performed in the nursery and when the nursery fills up it will copy the survivors(remains reachable objects) into the old generation.

The old generation is further divided into a number of equal size subregions called windows. The windows are numbered from 1 to n (indirect addressing). And lower number windows are collected before higher numbered windows. Each window size
is equal to the size of nursery. The windows themselves don’t need to be contiguous but the space in each window should be contiguous.

MC performs a full heap collection in mark phases and copy phase. And the two phases both are incremental. The mark phase there is a full collection marks live object in the following copy phase the live data is copied and compacted.

During the mark phase, MC traverses live objects in the heap and records the total live objects’ volume in each window. It will also constructs the remembered sets for each window. The marking is triggered when the occupancy of old generation exceeds a predefined threshold(80%). During the collection each window will be assigned a bitmap to record information(already marked, volume of live objects).
The marking phase is triggered earlier than wait until the old generation is full, so the marking work is triggered, nursery allocation can resume. The same write barrier used in incminimark.py will be used two to solve the problem with incremental marking.

During the following copy phase: 1)Grouping the windows. At the end of the mark phase, MC already knows the total volume of marked objects in each window. Then if the percentage of each window that needs to be copied reach one threshold, the window will be marked as high occupancy. High occupancy windows won't be copied and their remember sets will also be discarded since to pointers points to them need to be updated.
2) Copying data. In the copying data phase, the high occupancy window will be placed in the end of the collected list(It works well for large long live objects). The copy  is broken into a number of passes. Each pass live objects of  several windows from old generation will be copied and compacted.

Implementation:
The new mark-copy algorithm will be implemented in RPython and will be added into the module of current GCs(under rpython/memory/gc directory).

As the new GC algorithm shares some steps with current GC, the incremental marking phase and the nursery/old generation division, it can borrow the debug and test methods from current GC.

In the last step, the new GC will be evaluated using the performance metrics: overall execution time, space usage, pause time and program locality.

Timeline:
From Today to April 21st:
Read full pypy documentation and play around with code base. Try to solve the issue for _ast module and issues on bug tracker.

April 22nd to May 18th:
Investigate current available implementations: read generation.py, minipark.py
increminipark.py and all corrent GC collectors

Week 1:
Settled the details of the algorithm for implementation. Contact and discuss  with Mentor.

Week 2-4:
Retrieve the old abandoned mark-and -compact garbage collector for memory usage compare with the new mark-copy algorithm.

Week 5-7:
Implement the memory allocation and incremental collection phrase.
Write tests to test all the functionality.

Week 8-10:
Implement the memory allocation and incremental copy phrase.
Write tests to test all the functionality.

Week 11-13:
Combine the two phrase together and test the functionality of them.

Week 14-15 (August 4th-August 17th)Benchmark and investigate improvement.
Run on benchmarks(translating thetopaz ruby interpreter using various versions of PyPy and CPython), comparing memory usage, execution time, and pause times.
and test the benmarks inhttp://speed.pypy.org/

August 18th - Firm “pencils down” date

I can fully commit 40 hours(or more) per week for the project

Patches/Code samples/Pull requests:
My github profile:https://github.com/wenzhuman
Additional information:
About me:
• Proficient in C++, Java and Python
• Proficient in web development technologies including JavaScript, ActionScript, HTML
• Experience with compiler frameworks including LLVM, Clang and Soot
• Experience with Linux and OS X

In the master phase, my research area is in program analysis and I have implemented different kinds of program analysis tools in Clang and Soot. Last year, I have developed a Joos compiler(without using any framework) in group of three.  We implemented the compiler in Java which follows the modern compiler structure including DFA Scanner, LR1 Parser, AST Constructor, Symbol Table Generator, Type Checker, Static Analyzer and Code Generator.

Citation:


[2]http://people.cs.umass.edu/~emery/pubs/04-15.pdf