Data Readiness Self-Check
Before you can use AI effectively, you need to understand the state of your data. AI systems learn from data, make predictions based on data, and are only as reliable as the data they consume. This self-check will help you evaluate your organization's data across four critical dimensions: quality, accessibility, volume, and governance.
Data Quality
Data quality is the foundation of any successful AI initiative. Poor-quality data leads to inaccurate predictions, unreliable automation, and wasted investment. Even simple AI applications like automated reporting will produce misleading results if the underlying data is inconsistent or incomplete.
- ☐ Your records are free of significant duplicate entries (e.g., the same patient or customer does not appear multiple times with slightly different names)
- ☐ Required fields are consistently populated across your key datasets (no critical blanks in records)
- ☐ Data formats are consistent (dates are all in the same format, phone numbers follow one pattern, addresses are standardized)
- ☐ Records are current and regularly updated (stale data is archived or flagged)
- ☐ You have a process for validating data at the point of entry (dropdown menus, required fields, format validation)
- ☐ Error rates in data entry are tracked and known (you can quantify how often mistakes occur)
If most boxes are unchecked: Invest in a data cleanup project before pursuing AI. This typically involves deduplication, standardization, and implementing data validation rules in your entry systems. The good news is that this work improves your operations immediately, even without AI.
Data Accessibility
Even high-quality data is useless for AI if it is trapped in silos, locked in proprietary formats, or scattered across disconnected systems. AI tools need to access your data programmatically, which means it needs to be reachable through APIs, exports, or database connections.
- ☐ Your critical business data lives in structured systems (databases, EHR, CRM, accounting software) rather than spreadsheets and paper files
- ☐ Your key systems can export data in standard formats (CSV, JSON, XML) or offer API access
- ☐ Data that relates to the same entity (e.g., a patient) can be connected across systems using a common identifier
- ☐ You do not have critical data trapped in systems where the vendor will not allow export
- ☐ Staff across departments can access the data they need without relying on a single person to pull reports
- ☐ You have a centralized view of your key business metrics (even a simple dashboard counts)
If most boxes are unchecked: Focus on consolidating your data and eliminating silos. This might mean migrating from spreadsheets to a proper database, implementing integration between your key systems, or negotiating data export capabilities with your vendors.
Data Volume
AI models need enough data to identify patterns. The amount required depends on the use case. A simple automation might work with a few hundred records, while a predictive model might need thousands or tens of thousands. Understanding whether you have enough data for your intended use case is essential.
- ☐ You have at least 12 months of historical data for the processes you want to apply AI to
- ☐ Your datasets contain enough records to represent the range of scenarios your AI will encounter (not just common cases, but edge cases too)
- ☐ You continue to generate new data at a pace that will keep AI models current and relevant
- ☐ You have data covering both positive and negative outcomes (for predictive use cases, you need examples of both success and failure)
- ☐ Your data volume is manageable with your current storage and processing infrastructure
If most boxes are unchecked: Consider starting with AI use cases that require less data, such as rule-based automation or pre-trained AI tools that do not require custom model training. As you accumulate more data over time, more sophisticated use cases become viable.
Data Governance
Data governance defines who owns your data, who can access it, how it is protected, and how long it is retained. For healthcare organizations, data governance is not optional; HIPAA requires it. But even non-healthcare businesses need governance to use AI responsibly and effectively.
- ☐ You have a clear data owner or steward for each major dataset (someone is accountable for its accuracy and maintenance)
- ☐ Access to sensitive data is controlled and logged (you know who accessed what and when)
- ☐ You have a documented data retention policy (how long you keep data and when it is deleted)
- ☐ You understand the privacy regulations that apply to your data (HIPAA, state privacy laws, industry requirements)
- ☐ You have a process for responding to data subject requests (access, deletion, correction)
- ☐ You have evaluated whether your data can legally and ethically be used for AI purposes (especially if it includes patient or customer personal information)
If most boxes are unchecked: Establish basic data governance before introducing AI. At minimum, assign data owners, implement access controls, and document a retention policy. For healthcare organizations, this work is required by HIPAA regardless of your AI plans.
Interpreting Your Results
Review your checkmarks across all four sections:
- 20+ checks: Your data foundation is strong. You are well-positioned to pursue AI projects with confidence. Focus on selecting the right use case and partner.
- 12-19 checks: You have a reasonable foundation but need to address gaps in specific areas. Prioritize the section with the fewest checks before launching an AI project.
- Fewer than 12 checks: Your data needs significant work before AI can deliver reliable value. The good news: every improvement you make to your data benefits your business immediately through better reporting, fewer errors, and smoother operations.
Remember that data readiness is not all-or-nothing. Some AI use cases (like chatbots or document summarization) require minimal internal data, while others (like predictive analytics) require extensive, high-quality datasets. Match your ambitions to your current readiness level, and build from there.
