Implementation Description
General ML Functions
Provide several general ML functions.
This module allows the user to reuse common functions allow ML projects.
The module contains the following functions:
- 'plot_boxplot_for_variables - Plot a boxplot for all variables in variables_list.
- 'def search_for_categorical_variables' - Identify how many unique values exists in each column from df.
- 'plot_frequencias_valores_atributos' - Plot the frequency graphic for the attribute values for each variable in lista_atributos.
- 'plot_correlation_heatmap' = Plot the correlation betwenn pairs of continuos variables.
- 'def analyse_correlation_continuos_variables' - Analyse and plot the correlation betwenn pairs of continuos variables.
- 'analyse_plot_correlation_categorical_variables' - Analyse and plot the correlation betwenn pairs of categorical variables.
@author: ulf Bergmann
analyse_correlation_continuos_variables(df, lista_variaveis, quant_maximos)
Analyse and plot the correlation betwenn pairs of continuos variables.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
DataFrame to be analysed |
required |
lista_variaveis |
list
|
variable list |
required |
quant_maximos |
number of maximum values |
required |
Returns:
Name | Type | Description |
---|---|---|
top_pairs_df |
DataFrame
|
sorted DataFrame with Variable1 | Variable 2 | Correlation |
corr_matrix |
Array
|
Correlation matrix with p-values on the upper triangle |
Source code in templates\lib\funcoes_ulf.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
|
analyse_plot_correlation_categorical_variables(df, lista_variaveis)
Analyse and plot the correlation betwenn pairs of categorical variables. Variables must be not continuos (not float).
Use the qui-quadrad and p-value for
H0: dependent variables H1: independent variables
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
DataFrame to be analysed |
required |
lista_variaveis |
list
|
variable list |
required |
Returns:
Name | Type | Description |
---|---|---|
resultant |
DataFrame
|
Dataframe with all p-values |
lista_resultado_analise |
DataFrame
|
with Variable1 | Variable 2 | p-value |
Source code in templates\lib\funcoes_ulf.py
228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 |
|
fill_categoric_field_with_value(serie, replace_nan)
Replace categorical value with int value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
serie |
Series
|
data to be replace categorical with int |
required |
replace_nan |
Boolean
|
flag to replace nan with an index |
required |
Returns:
Type | Description |
---|---|
Series
|
replaced values |
Source code in templates\lib\funcoes_ulf.py
273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 |
|
plot_boxplot_for_variables(df, variables_list)
Plot a boxplot for all variables in variables_list.
Can be used to verify if the variables are in the same scale
Examples:
>>> plot_boxplot_for_variables(df, ['va1' , 'var2' , 'var3'])
return None
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
DataFrame to be analysed. |
required |
variables_list |
list
|
variable list. |
required |
Returns:
Type | Description |
---|---|
None
|
|
Source code in templates\lib\funcoes_ulf.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
|
plot_correlation_heatmap(df, lista_variaveis)
Plot the correlation betwenn pairs of continuos variables.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
DataFrame to be analysed |
required |
lista_variaveis |
list
|
continuos variable list |
required |
Returns:
Type | Description |
---|---|
None
|
|
Source code in templates\lib\funcoes_ulf.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
|
plot_frequencias_valores_atributos(df, lista_atributos)
Plot the frequency graphic for the attribute values for each variable in lista_atributos.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
DataFrame to be analysed |
required |
lista_atributos |
list
|
variable list |
required |
Returns:
Type | Description |
---|---|
None
|
|
Source code in templates\lib\funcoes_ulf.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|
print_count_cat_var_values(df, lista_atributos)
Print the attribute values for each categorical variable in lista_atributos.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
DataFrame to be analysed |
required |
lista_atributos |
list
|
variable list |
required |
Returns:
Type | Description |
---|---|
None
|
|
Source code in templates\lib\funcoes_ulf.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
search_for_categorical_variables(df)
Identify how many unique values exists in each column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
DataFrame to be analysed. |
required |
Returns:
Name | Type | Description |
---|---|---|
cat_stats |
DataFrame
|
Result DataFrame with
|
Source code in templates\lib\funcoes_ulf.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
|