Parallel programming is an emerging trend for writing parallel applications for compute clusters with multiple processing cores not only for the scientific community but also for the research students and industry. Message Passing Interface (MPI) has emerged as a de facto standard for writing parallel applications as a programming language independent communication protocol which provides the procedures and rules for passing messages. MPI has been implemented in many high level languages including Fortran, C and Java. One such java implementation is MPJ Express – an open source Java message passing library. The parallel execution of programs especially in distributed memory model requires data exchange among various processors using Collective Communication operations. These operations are responsible for data distribution, consolidation as well as computation and as such are critical to the performance of parallel applications. The current version (0.38) of the MPJ Express implements linear algorithms for Collective Communication operations which form the main source of overhead while executing parallel programs. This work aims at optimization of existing Collective Communication operations, evaluation using established benchmarks and subsequent integration in MPJ Express. We report a percentage improvement of 1% to 90% for newly implemented Collective primitives