Hi All,
 
There is an ongoing performance issue for one customer who moved from ganesha 2.3 to ganesha 2.7 recently. They say they saw better performance with ganesha 2.3
For measuring performance they wrote below program to write 50GB of data, in the program multiple threads run at the same time and each thread opens the same file and writes 1MB of data to the file.
They ran this program from 2 places - 1) directly on the GPFS node to write data to the file system. 2) And also ran the same program from an old RHEL5.5 NFS client.
 
The function which gets invoked inside each thread is:
 
void *write_data(void *data) {
        int i;
        struct thread_data *tdata = (struct thread_data *)data;
        for(i=0;i<tdata->mb;i++) {
                lseek(tdata->fd, tdata->offset, 0);
                write(tdata->fd, tdata->data, SIZE);
                fsync(tdata->fd);
                tdata->offset += SIZE;
        }
}
int main(int argc, char *argv[]) {
                for(i=0;i<NUM_THREAD;i++) {
                        ....
                        // open output file
                        char filepath[256];
                        tdata[i].mb = atoi(argv[2])/NUM_THREAD;
                        tdata[i].offset = tdata[i].mb*SIZE*(off_t)i;
                        sprintf(filepath, "%s/dac.bin", argv[1]);
                        tdata[i].fd = open(filepath, O_CREAT|O_RDWR|O_DIRECT|O_SYNC|O_LARGEFILE, 0666);
                        if(tdata[i].fd<0) {
                                return(-1);
                        }
                        // write data
                        pthread_create(&pthread[i], NULL, &write_data, &tdata[i]);
                }
                for(i=0;i<NUM_THREAD;i++) {
                        pthread_join(pthread[i], NULL);
                        close(tdata[i].fd);
                }
}
 
Does the above program to measure performance look problematic ?  in the program multiple threads run in parallel and open the same file and do write(), fsync() to the file.
 
They collected GPFS (file system) statistics data every 2 minutes or so while running the program.
In the first case, when they ran the write program directly on the GPFS node, they saw average time taken for writing data from GPFS node to the file system remained almost the same in every 2 minutes of statistics captured.
But in the second case, when they ran the program from NFS client, they saw average time taken to write data to the file system kept on increasing. The statistics were captured every 2 minutes. So their view was that when ganesha was used, average time taken to write data to disk kept on increasing.
 
We checked with our GPFS file system team which said that "this is happening because there are multiple threads performing writes to the same file with DSYNC options. And suggested to check if Ganesha has some config parameters to avoid DSYNC on write and limit number of concurrent writes to the same file."
 
About avoiding DSYNC... They are already using NFS_Commit=TRUE with ganesha 2.7 (which has fix for NFS_Commit). so not understanding what avoid write with DSYNC from Ganesha side means here, any thoughts ?
 
And what do you think about the comment of having config parameter in ganesha to limit concurrent writes to the same file ?
 
Frank, Dan
I had checked about above similar question on IRC yesterday. Please let me know if you have any other thoughts considering the above additional new information ?

Thanks,
Madhu Thorat.