Ignore missing data in a table group
    14 visualizaciones (últimos 30 días)
  
       Mostrar comentarios más antiguos
    
    Marcus Glover
      
 el 27 de Oct. de 2021
  
    
    
    
    
    Editada: Marcus Glover
      
 el 27 de Oct. de 2021
            I have a table with some missing data. I am using findgroups and splitapply to do some calculations on columns of the table, but when a group has a missing value the calculation returns missing. 
I would like to ignore the missing value in these calculations but without removing the enitre row- other colums have valid data. 
>> T = readtable('messy.csv','TreatAsEmpty',{'.','NA'})
T =
  21×5 table
       A         B          C          D       E  
    ________    ____    __________    ____    ____
    {'afe1'}       3    {'yes'   }       3       3
    {'egh3'}     NaN    {'no'    }       7       7
    {'wth4'}       3    {'yes'   }       3       3
    {'atn2'}      23    {'no'    }      23      23
    {'arg1'}       5    {'yes'   }       5       5
    {'jre3'}    34.6    {'yes'   }    34.6    34.6
    {'wen9'}     234    {'yes'   }     234     234
    {'ple2'}       2    {'no'    }       2       2
    {'dbo8'}       5    {'no'    }       5       5
    {'oii4'}       5    {'yes'   }       5       5
    {'wnk3'}     245    {'yes'   }     245     245
    {'abk6'}     563    {0×0 char}     563     563
    {'pnj5'}     463    {'no'    }     463     463
    {'wnn3'}       6    {'no'    }       6       6
    {'oks9'}      23    {'yes'   }      23      23
    {'wba3'}     NaN    {'yes'   }     NaN      14
    {'pkn4'}       2    {'no'    }       2       2
    {'adw3'}      22    {'no'    }      22      22
    {'poj2'}     -99    {'yes'   }     -99     -99
    {'bas8'}      23    {'no'    }      23      23
    {'gry5'}     NaN    {'yes'   }     NaN      21
    >> [G,gen]=findgroups(T(:,[3])); %find groups based on column C
    >> gen
gen =
  2×1 table
       C   
    _______
    {'no' }
    {'yes'}
    %% find mean of columns E and D based on Groups in column C ('no' or 'yes')
    >> MeanE=splitapply(@mean,T(:,5),G)
MeanE =
          61.4444444444444
          44.4181818181818
    % works 
    >> MeanD=splitapply(@mean,T(:,4),G) 
MeanD =
          61.4444444444444
                       NaN
    % Does not work for 'yes' group due to NaN values in column 4
    >> MeanD=splitapply(@mean,rmmissing(T(:,4)),G);
    %fails because rmmissing(T(:,4)) and G are differnet sizes
I would like to be able to ignore the NaN values in column D when calcualting the mean. I can't seem to make it work with ismissing or rmmissing, and feel like this should be simpler than splitting them manually. 
2 comentarios
  Johan
      
 el 27 de Oct. de 2021
				I'm not used to working with table but maybe instead of using @mean you can define a function and use this in your splitapply call.
mymean = @(x) mean(x,'omitnan');
splitapply(mymean,T(:,4),G) 
Respuesta aceptada
  Ive J
      
 el 27 de Oct. de 2021
        As Johan also suggested you may use omitnant flag. Also, consider using groupsummary and groupfilter :
m = groupsummary(T, 'C', @(x)mean(x, 'omitnan'), {'D', 'E'})
        C         GroupCount    fun1_D    fun1_E
    __________    __________    ______    ______
    {0×0 char}         1           563       563
    {'no'    }         9        61.444    61.444
    {'yes'   }        11          50.4    44.418
1 comentario
Más respuestas (0)
Ver también
Categorías
				Más información sobre Data Type Identification en Help Center y File Exchange.
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


