Table 3346. Software/interface used in data science and machine learning.
Software |
System |
Function |
Reference |
Amazon Redshift ML |
|
ML models directly in a data warehouse |
page3340 |
Apache HBase |
|
NoSQL database, data warehouse |
page3347
page3393 |
Apache Flink |
|
For stream (data) processing, Flink is used in data lakes |
page3347
page3335 |
Apache Hive |
|
Data warehouse. Hive and HDFS are part of the Hadoop ecosystem. |
page3394
page3305 |
Apache Impala |
|
Provides high-performance, low-latency SQL queries |
page3337 |
Apache Kudu |
|
For fast analytics on rapidly changing data |
page3336 |
Apache Spark |
Built on Scala |
Data processing |
page3347 |
Apache Storm |
|
Data processing |
page3347 |
Apache Superset |
|
An open-source to create dashboards for data understanding |
page3340 |
AWS Kinesis |
|
Data collection |
page3347 |
AWS Machine Learning |
|
Machine Learning API |
page3340 |
AWS S3 |
|
Data lake |
page3338 |
Azure Data Lake Storage |
|
Data lake |
page3338 |
Catalyst |
|
Optimize the logical and physical plan of SQL queries |
page3331 |
BERT |
|
Natural language processing |
page3344 |
BigQuery ML |
|
ML models directly in a data warehouse, evaluating a ML model |
page3340
page3332 |
Cassandra |
|
Data storage |
page3347 |
Dask |
A Python library |
Parallel computing feature engineering at scale |
page3348 |
Elastic Stack |
|
Monitoring and management |
page3347 |
Elasticsearch |
|
Data storage |
page3347 |
Facets Overview |
|
Open-source tool for visualizing and understanding machine learning datasets |
page3345 |
Featuretools |
A library |
Automated feature engineering at scale |
page3348 |
FlinkML (for distributed machine learning) |
|
Analytics and machine learning |
page3347 |
Google Cloud AI |
|
Machine Learning API |
page3340 |
Google AutoML |
|
A suite of machine learning products |
page3751 |
Google Cloud Natural Language API |
|
Analyzing and understanding the content of text |
page3342 |
Google Cloud Shell |
|
Free online development environment |
page3365 |
Google Cloud Platform (GCP) |
|
A comprehensive cloud computing service |
page3391 |
Google Kubernetes Engine (GKE) |
|
Managed environment |
page3373 |
Google Vertex Vizier |
|
A hyperparameter tuning service |
page3751 |
Grafana |
|
Data visualization and reporting |
page3347 |
GPT |
|
Natural language processing |
page3344 |
Hadoop |
|
Data lake, Hadoop is needed if you want to integrate with HDFS |
page3338
page3319 |
Hadoop Distributed File System (HDFS) |
|
Data lake. Hive and HDFS are part of the Hadoop ecosystem. |
page3395
page3305 |
LSTM networks |
|
Convert speech into text |
page3344 |
Kafka |
|
Data collection |
page3347 |
Keras |
|
For building and training deep learning models |
page4243 |
Kibana |
|
Data visualization and reporting |
page3347 |
MapReduce |
|
For easily writing applications |
page3400 |
Microsoft Azure Machine Learning |
|
Machine Learning API |
page3340 |
MLlib (Spark) |
|
Analytics and machine learning |
page3347 |
Power BI |
|
Create dashboards for data understanding and reporting |
page3340 |
Prometheus |
|
Monitoring and management |
page3347 |
PyArrow |
|
Apache data ingestion frameworks (ADIF) for CSV to DataFrame conversion |
page3328 |
PyFlink |
|
Apache data ingestion frameworks (ADIF) for CSV to DataFrame conversion |
page3328 |
PySpark |
|
Apache data ingestion frameworks (ADIF) for CSV to DataFrame conversion |
page3328 |
PyTorch (for model training) |
|
Analytics and machine learning, developed by Facebook |
page3347 |
RabbitMQ |
|
Data collection |
page3347 |
R-CNN |
|
Object detection and image segmentation |
page3344 |
Mask R-CNN |
|
Object detection and image segmentation |
page3344 |
RNNs networks |
|
Convert speech into text |
page3344 |
Redis |
|
Data storage |
page3347 |
scikit-learn |
|
User-friendly machine learning library |
page4312 |
Splunk |
|
Monitoring and management |
page3347 |
Tableau |
|
Data visualization and reporting |
page3347 |
TensorFlow |
|
Analytics and machine learning |
page3347 |
TensorFlow Data Validation (TFDV) |
|
Analyze data to find potential problems |
page3349 |
Tungsten |
|
Improve the efficiency of memory and CPU for Spark applications |
page3330 |
YOLO |
|
Object detection and image segmentation |
page3344 |