[Pkg-gauche-devel] threads and fork on machine with VIPT-WB cache

James Bottomley James.Bottomley at HansenPartnership.com
Mon Apr 5 16:18:38 UTC 2010


On Sun, 2010-04-04 at 22:51 -0400, John David Anglin wrote:
> > Thanks a lot for the discussion.
> > 
> > James Bottomley wrote:
> > > So your theory is that the data the kernel sees doing the page copy can
> > > be stale because of dirty cache lines in userspace (which is certainly
> > > possible in the ordinary way)?
> > 
> > Yes.
> > 
> > > By design that shouldn't happen: the idea behind COW breaking is
> > > that before it breaks, the page is read only ... this means that
> > > processes can have clean cache copies of it, but never dirty cache
> > > copies (because writes are forbidden).
> > 
> > That must be design, I agree.
> > 
> > To keep this condition (no dirty cache for COW page), we need to flush
> > cache before ptep_set_wrprotect.  That's my point.
> > 
> > Please look at the code path:
> >    (kernel/fork.c)
> >    do_fork -> copy_process -> copy_mm -> dup_mm -> dup_mmap ->
> >    (mm/memory.c)
> >    copy_page_range -> copy_p*d_range -> copy_one_pte -> ptep_set_wrprotect
> > 
> > The function flush_cache_dup_mm is called from dup_mmap, that's enough
> > for a case of a process with single thread.
> > I think that:
> > We need to flush cache before ptep_set_wrprotect for a process with
> > multiple threads.  Other threads may change memory after a thread
> > invokes do_fork and before calling ptep_set_wrprotect.  Specifically,
> > a process may sleep at pte_alloc function to get a page.
> 
> I agree.  It is interesting that in the case of the Debian bug that
> a thread of the parent process causes the COW break and thereby corrupts
> its own memory.  As far as I can tell, the fork'd child never writes
> to the memory that causes the fault.
> 
> My testing indicates that your suggested change fixes the Debian
> bug.  I've attached below my latest test version.  This seems to fix
> the bug on both SMP and UP kernels.
> 
> However, it doesn't fix all page/cache related issues on parisc
> SMP kernels that I commonly see.
> 
> My first inclination after even before reading your analysis was
> to assume that copy_user_page was broken (i.e, that even if a
> processor cache was dirty when the COW page was write protected,
> it should be possible to do the flush before the page is copied).
> However, this didn't seem to work...  Possibly, there are issues
> with aliased addresses.
> 
> I note that sparc flushes the entire cache and purges the entire
> tlb in kmap_atomic/kunmap_atomic for highmem.  Although the breakage
> that I see is not limited to PA8800/PA8900, I'm not convinced
> that we maintain coherency that is required for these processors
> in copy_user_page when we have multiple threads.
> 
> As a side note, kmap_atomic/kunmap_atomic seem to lack calls to
> pagefault_disable()/pagefault_enable() on PA8800.
> 
> Dave
> -- 
> J. David Anglin                                  dave.anglin at nrc-cnrc.gc.ca
> National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)
> 
> diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
> index a27d2e2..b140d5c 100644
> --- a/arch/parisc/include/asm/pgtable.h
> +++ b/arch/parisc/include/asm/pgtable.h
> @@ -14,6 +14,7 @@
>  #include <linux/bitops.h>
>  #include <asm/processor.h>
>  #include <asm/cache.h>
> +extern void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, unsigned long pfn);
>  
>  /*
>   * kern_addr_valid(ADDR) tests if ADDR is pointing to valid kernel
> @@ -456,17 +457,22 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
>  	return old_pte;
>  }
>  
> -static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> +static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct mm_struct *mm, unsigned long addr, pte_t *ptep)
>  {
>  #ifdef CONFIG_SMP
>  	unsigned long new, old;
> +#endif
> +	pte_t old_pte = *ptep;
> +
> +	if (atomic_read(&mm->mm_users) > 1)

Just to verify there's nothing this is hiding, can you make this 

	if (pte_dirty(old_pte))

and reverify?  The if clause should only trip on the case where the
parent has dirtied the line between flush and now.

> +		flush_cache_page(vma, addr, pte_pfn(old_pte));
>  
> +#ifdef CONFIG_SMP
>  	do {
>  		old = pte_val(*ptep);
>  		new = pte_val(pte_wrprotect(__pte (old)));
>  	} while (cmpxchg((unsigned long *) ptep, old, new) != old);
>  #else
> -	pte_t old_pte = *ptep;
>  	set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
>  #endif
>  }
> diff --git a/mm/memory.c b/mm/memory.c
> index 09e4b1b..21c2916 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -616,7 +616,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  	 * in the parent and the child
>  	 */
>  	if (is_cow_mapping(vm_flags)) {
> -		ptep_set_wrprotect(src_mm, addr, src_pte);
> +		ptep_set_wrprotect(vma, src_mm, addr, src_pte);

So this is going to be a hard sell because of the arch churn. There are,
however, three ways to do it with the original signature.

     1. implement copy_user_highpage ... this allows us to copy through
        the child's page cache (which is coherent with the parent's
        before the cow) and thus pick up any cache changes without a
        flush
     2. use the mm identically to flush_user_cache_page_noncurrent.  The
        only reason that needs the vma is for the icache check ... but
        that shouldn't happen here (if the parent is actually doing a
        self modifying exec region, it needs to manage coherency
        itself).
     3. Flush in kmap ... this is something that's been worrying me
        since the flamewars over kmap for pio.

James





More information about the Pkg-gauche-devel mailing list