In this version I have: * Fixed some bugs in data_cleansing,plot_table,check_rules.
In this version I have: * Fixed some bugs in check_rules, time_transfer.
#creditmodel-1.2.2
In this version I have: * Fixed some bugs in get_ctree_rules, ks_plot,cross_table.
#creditmodel-1.2.1
In this version I have: * Enhanced strategy analysis capabilities. * New function rule_value_replace is for generating new variables by rules. * Fixed some potential bugs in ks_plot, perf_table,training_model,process_nas.
In this version I have: * Enhanced strategy analysis capabilities. * New function replace_value is for replacing values of some variables. * Fixed some potential bugs in check_rules, get_ctree_rules,rules_filter,%alike%.
In this version I have: * New function plot_distribution ,plot_relative_freq_histogram, plot_box,plot_density, plot_bar are for data visualization. * New function swap_analysis is for swap out/swap in analysis. * New function rules_filter is used to filter or select samples by rules * Fixed some potential bugs in char_to_num, merge_category,check_rules,get_ctree_rules.
In this version I have: * New function cross_table is for cross table analysis. * Fixed some potential bugs in data_cleansing, low_variance_filter,time_variable,plot_vars.
In this version I have: * New function entropy_weight for is for calculating Entropy Weight. * New function term_tfidf for computing tf-idf of documents. * New function plot_oot_perf for plotting performance of over time samples in the future. * Fixed some potential bugs in get_breaks, lift_plot,perf_table,model_result_plot. * Add a parameter cut_bin to get_breaks for cutting breaks equal depth or equal width.
In this version I have:
split_bins, woe_transfertime_series_proc for time series data processing.ranking_percent_proc,ranking_percent_dict are for processing ranking percent variables and generating ranking percent dictionary.read_dt to read_data and add and parameter pattern for matching files.traing_xgb,‘xgb_params’save_dt to save_data and save_data also supports multiple data frames.In this version I have:
pred_xgb for using xgboost model to predict new data.get_psi_plots, psi_plot to plot PSI of your data..p_to_score for transforming probability to score.multi_left_jion for left jion a list of datasets fast.read_data for loading csv or txt data fast.In this version I have:
xgb_filter, feature_selector, split_bins, ks_table_plot, ks_psi_plot, ks_value.pred_score for predicting new data using scorecard.lr_params_search, xgb_params_search for searching the optimal parameters. “random_search”,“grid_search”,“local_search” are available.partial_dependence_plot, get_partial_dependence_plots for generating partial dependence plot.cohort_analysis, cohort_table, cohort_plot for cohort (vintage) analysis and visualization.perf_table, roc_plot, ks_plot, lift_plot, psi_plot for model validation drawings.In this version I have: * Fixed some potential bugs in get_names, digits_num
In this version I have:
data_exploration for data exploration.missing_proc, outliers_proc ,get_nameslasso_filter, AUC&K-S is added to select the best lambda. In this way, not only can the set of variables that makes the AUC or K-S maximized be selected, but also the multicollinearity (which is difficult to eliminate by AIC in stepwise regression), can be minimized. That means instead of stepwise regression, the optimal combination of variables can be selected by lasso to solve the regression problem.K-S or AUC values corresponding to different lambda.auc_value ks_value, which can calculate Kolmogorov-Smirnov (K-S) & AUC of multiple model results quickly.