[데이터 분석] Data Leakage Part 3. sklearn.pipeline

머신러닝

[데이터 분석] Data Leakage Part 3. sklearn.pipeline

CocoJamjam 2023. 4. 21. 12:17

728x90

[데이터 분석] Data Leakage Part 2. Pipeline architecture

Data Leakage Part 1. 글에서 Data Leakage의 발생과 문제점 감지 등에 대해 다루었다. 이번 글에서는 Data Leakage의 해결방안 중 하나인 Pipeline architecture에 대해 알아보겠다. 목차 1. Pipeline architecture 란 무엇

james-choi88.tistory.com

에서 Data Leakage의 해결방안 중 하나인 Pipeline architecture에 대해 알아보았다.

이번 글에서는 python에서 sklearn.pipeline 라이브러리에 대해 알아보겠다.

Methods

decision_function(X)	Transform the data, and apply decision_function with the final estimator.
fit(X[, y])	Fit the model.
fit_predict(X[, y])	Transform the data, and apply fit_predict with the final estimator.
fit_transform(X[, y])	Fit the model and transform with the final estimator.
get_feature_names_out([input_features])	Get output feature names for transformation.
get_params([deep])	Get parameters for this estimator.
inverse_transform(Xt)	Apply inverse_transform for each step in a reverse order.
predict(X, **predict_params)	Transform the data, and apply predict with the final estimator.
predict_log_proba(X, **predict_log_proba_params)	Transform the data, and apply predict_log_proba with the final estimator.
predict_proba(X, **predict_proba_params)	Transform the data, and apply predict_proba with the final estimator.
score(X[, y, sample_weight])	Transform the data, and apply score with the final estimator.
score_samples(X)	Transform the data, and apply score_samples with the final estimator.
set_output(*[, transform])	Set the output container when "transform" and "fit_transform" are called.
set_params(**kwargs)	Set the parameters of this estimator.
transform(X)	Transform the data, and apply transform with the final estimator.

Scikit-learn에서 직접 발췌한 Methods 들이다.

pipeline은 일반적으로 transformer와 estimator로 구성된다.

transformer는 입력 데이터를 변환하거나 전처리하는 역할을 하며,

estimator는 변환된 데이터를 기반으로 모델을 학습하고 예측한다.

1. decision_function(x) : 데이터를 변환하고, decision_function를 최종 estimator에 적용한다.
2. fit(X[, y]) : 모델을 맞춘다.
3. fit_predict(X[, y]) : 데이터를 변환하고 fit_predict를 최종 estimator와 함께 적용합니다.
4. fit_transform(X[, y]) : 모델을 맞추고 최종 estimator를 사용하여 변환합니다.
5. get_feature_names_out( [INPUT_FATURES] ) : 변환을 위한 출력 특징 이름을 가져옵니다.
6. get_params( [Deep] ) : 이 estimator의 매개 변수를 가져옵니다.
7. inverse_transform(Xt) : 각 단계에 역순으로 inverse_transform을 적용합니다.
8. predict(X, **predict_params) : 데이터를 변환하고 최종 estimator를 사용하여 예측을 적용합니다.
9. predict_log_proba(X, **predict_log_proba_params) : 데이터를 변환하고 predict_log_proba를 최종 estimator와 함께 적용합니다.
10. predict_proba(X, **predict_proba_params) : 데이터를 변환하고 predict_proba를 최종 estimator와 함께 적용합니다.
11. score(X[, y, sample_weight]) : 데이터를 변환하고 최종 estimator로 점수를 적용합니다.
12. score_samples(X) : 데이터를 변환하고 최종 estimator와 함께 score_samples를 적용합니다.
13. set_output(*[, transform]) : "transform" 및 "fit_transform"이 호출될 때 출력 컨테이너를 설정합니다.
14. set_params(**kwargs) : 이 estimator의 매개변수를 설정합니다.
15. transform(X) : 데이터를 변환하고 최종 estimator와 함께 변환을 적용합니다.

출처 :

https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#

sklearn.pipeline.Pipeline

Examples using sklearn.pipeline.Pipeline: Feature agglomeration vs. univariate selection Feature agglomeration vs. univariate selection Pipeline ANOVA SVM Pipeline ANOVA SVM Poisson regression and ...

scikit-learn.org

https://medium.com/vickdata/a-simple-guide-to-scikit-learn-pipelines-4ac0d974bdcf

A Simple Guide to Scikit-learn Pipelines

Learn how to use pipelines in a scikit-learn machine learning workflow

medium.com

https://queirozf.com/entries/scikit-learn-pipeline-examples#pipeline-example

Scikit-Learn Pipeline Examples

Examples of how to use classifier pipelines on Scikit-learn. Includes examples on cross-validation regular classifiers, meta classifiers such as one-vs-rest and also keras models using the scikit-learn wrappers.

queirozf.com

728x90

'머신러닝' 카테고리의 다른 글

[데이터 분석] Data Leakage Part 4. sklearn.pipeline 실습예제(1) (0)	2023.04.23
[데이터 분석] Data Leakage Part 2. Pipeline architecture (1)	2023.04.21
[데이터 분석] Data Leakage Part 1. (0)	2023.04.20
[데이터 분석] Feature Engineering (0)	2023.04.19

현재글[데이터 분석] Data Leakage Part 3. sklearn.pipeline

아빠의 개발/분석 도전 기록장 GitHub : https://github.com/ChoiJMS2

250x250

경진대회, Kaggle, Plotly, 실습, 빅데이터, 머신러닝, streamlit, API, Project, Python,

Today :
Yesterday :

코코잼잼의 개발 도전