Inscrivez-vous ou connectez-vous pour rejoindre votre communauté professionnelle.
RapidMiner and Weka are good tools for data mining, but they have some leakages in cleansing data and preparing it before building models. R and python with scikit-learn are excellent tools for data mining. R is more popular than python, even i prefer using python with Scikit-learn. In general, if you (are/have) a good programmer, it is preferable to use R or python, if not, weka or RapidMiner is ok.
Written in the Java Programming language, this tool offers advanced analytics through template-based frameworks. A bonus: Users hardly have to write any code. Offered as a service, rather than a piece of local software, this tool holds top position on the list of data mining tools.
In addition to data mining, RapidMiner also provides functionality like data preprocessing and visualization, predictive analytics and statistical modeling, evaluation, and deployment. What makes it even more powerful is that it provides learning schemes, models and algorithms from WEKA and R scripts.
The original non-Java version of WEKA primarily was developed for analyzing data from the agricultural domain. With the Java-based version, the tool is very sophisticated and used in many different applications including visualization and algorithms for data analysis and predictive modeling. Its free under the GNU General Public License, which is a big plus compared to RapidMiner, because users can customize it however they please.
WEKA supports several standard data mining tasks, including data preprocessing, clustering, classification, regression, visualization and feature selection. WEKA would be more powerful with the addition of sequence modeling, which currently is not included.
SPSS Clementine
is one of the best and most friendly usage data mining software
It depends strongly upon your needs.
If you'd like to do machine learning on a local machine with small amounts of data (less than 100,000 rows), Weka is an excellent tool with an accessible interface.
If you'd like to do something a little more advanced, R is a popular statistical package/programming language that is extremely flexible.
Finally, a number of software packages written in Python (a full-stack programming language) are extremely popular for large-scale machine learning. It's what my company primarily uses. Specifically, we use scikit-learn, statsmodels, pandas, and numpy primarily.
All of the tools discussed here are 'open source', meaning that they're free to use.
R is the best tool for Data mining
Weka is one of the best tools for data mining.
It's easy to use, effective and fast.