File size: 10,612 Bytes
eebf5c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
# Cryptocurrency Data Aggregator - Complete Rewrite

A production-ready cryptocurrency data aggregation application with AI-powered analysis, real-time data collection, and an interactive Gradio dashboard.

## Features

### Core Capabilities
- **Real-time Price Tracking**: Monitor top 100 cryptocurrencies with live updates
- **AI-Powered Sentiment Analysis**: Using HuggingFace models for news sentiment
- **Market Analysis**: Technical indicators (MA, RSI), trend detection, predictions
- **News Aggregation**: RSS feeds from CoinDesk, Cointelegraph, Bitcoin.com, and Reddit
- **Interactive Dashboard**: 6-tab Gradio interface with auto-refresh
- **SQLite Database**: Persistent storage with full CRUD operations
- **No API Keys Required**: Uses only free data sources

### Data Sources (All Free, No Authentication)
- **CoinGecko API**: Market data, prices, rankings
- **CoinCap API**: Backup price data source
- **Binance Public API**: Real-time trading data
- **Alternative.me**: Fear & Greed Index
- **RSS Feeds**: CoinDesk, Cointelegraph, Bitcoin Magazine, Decrypt, Bitcoinist
- **Reddit**: r/cryptocurrency, r/bitcoin, r/ethtrader, r/cryptomarkets

### AI Models (HuggingFace - Local Inference)
- **cardiffnlp/twitter-roberta-base-sentiment-latest**: Social media sentiment
- **ProsusAI/finbert**: Financial news sentiment
- **facebook/bart-large-cnn**: News summarization

## Project Structure

```
crypto-dt-source/
β”œβ”€β”€ config.py          # Configuration constants
β”œβ”€β”€ database.py        # SQLite database with CRUD operations
β”œβ”€β”€ collectors.py      # Data collection from all sources
β”œβ”€β”€ ai_models.py       # HuggingFace model integration
β”œβ”€β”€ utils.py           # Helper functions and utilities
β”œβ”€β”€ app.py             # Main Gradio application
β”œβ”€β”€ requirements.txt   # Python dependencies
β”œβ”€β”€ README.md          # This file
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ database/      # SQLite database files
β”‚   └── backups/       # Database backups
└── logs/
    └── crypto_aggregator.log  # Application logs
```

## Installation

### Prerequisites
- Python 3.8 or higher
- 4GB+ RAM (for AI models)
- Internet connection

### Step 1: Clone Repository
```bash
git clone <repository-url>
cd crypto-dt-source
```

### Step 2: Install Dependencies
```bash
pip install -r requirements.txt
```

This will install:
- Gradio (web interface)
- Pandas, NumPy (data processing)
- Transformers, PyTorch (AI models)
- Plotly (charts)
- BeautifulSoup4, Feedparser (web scraping)
- And more...

### Step 3: Run Application
```bash
python app.py
```

The application will:
1. Initialize the SQLite database
2. Load AI models (first run may take 2-3 minutes)
3. Start background data collection
4. Launch Gradio interface

Access the dashboard at: **http://localhost:7860**

## Gradio Dashboard

### Tab 1: Live Dashboard πŸ“Š
- Top 100 cryptocurrencies with real-time prices
- Columns: Rank, Name, Symbol, Price, 24h Change, Volume, Market Cap
- Auto-refresh every 30 seconds
- Search and filter functionality
- Color-coded price changes (green/red)

### Tab 2: Historical Charts πŸ“ˆ
- Select any cryptocurrency
- Choose timeframe: 1d, 7d, 30d, 90d, 1y, All
- Interactive Plotly charts with:
  - Price line chart
  - Volume bars
  - MA(7) and MA(30) overlays
  - RSI indicator
- Export charts as PNG

### Tab 3: News & Sentiment πŸ“°
- Latest cryptocurrency news from 9+ sources
- Filter by sentiment: All, Positive, Neutral, Negative
- Filter by coin: BTC, ETH, etc.
- Each article shows:
  - Title (clickable link)
  - Source and date
  - AI-generated sentiment score
  - Summary
  - Related coins
- Market sentiment gauge (0-100 scale)

### Tab 4: AI Analysis πŸ€–
- Select cryptocurrency
- Generate AI-powered analysis:
  - Current trend (Bullish/Bearish/Neutral)
  - Support/Resistance levels
  - Technical indicators (RSI, MA7, MA30)
  - 24-72h prediction
  - Confidence score
- Analysis saved to database for history

### Tab 5: Database Explorer πŸ—„οΈ
- Pre-built SQL queries:
  - Top 10 gainers in last 24h
  - All positive sentiment news
  - Price history for any coin
  - Database statistics
- Custom SQL query support (read-only for security)
- Export results to CSV

### Tab 6: Data Sources Status πŸ”
- Real-time status monitoring:
  - CoinGecko API βœ“
  - CoinCap API βœ“
  - Binance API βœ“
  - RSS feeds (5 sources) βœ“
  - Reddit endpoints (4 subreddits) βœ“
  - Database connection βœ“
- Shows: Status (🟒/πŸ”΄), Last Update, Error Count
- Manual refresh and data collection controls
- Error log viewer

## Database Schema

### `prices` Table
- `id`: Primary key
- `symbol`: Coin symbol (e.g., "bitcoin")
- `name`: Full name (e.g., "Bitcoin")
- `price_usd`: Current price in USD
- `volume_24h`: 24-hour trading volume
- `market_cap`: Market capitalization
- `percent_change_1h`, `percent_change_24h`, `percent_change_7d`: Price changes
- `rank`: Market cap rank
- `timestamp`: Record timestamp

### `news` Table
- `id`: Primary key
- `title`: News article title
- `summary`: AI-generated summary
- `url`: Article URL (unique)
- `source`: Source name (e.g., "CoinDesk")
- `sentiment_score`: Float (-1 to 1)
- `sentiment_label`: Label (positive/negative/neutral)
- `related_coins`: JSON array of coin symbols
- `published_date`: Original publication date
- `timestamp`: Record timestamp

### `market_analysis` Table
- `id`: Primary key
- `symbol`: Coin symbol
- `timeframe`: Analysis period
- `trend`: Trend direction (Bullish/Bearish/Neutral)
- `support_level`, `resistance_level`: Price levels
- `prediction`: Text prediction
- `confidence`: Confidence score (0-1)
- `timestamp`: Analysis timestamp

### `user_queries` Table
- `id`: Primary key
- `query`: SQL query or search term
- `result_count`: Number of results
- `timestamp`: Query timestamp

## Configuration

Edit `config.py` to customize:

```python
# Data collection intervals
COLLECTION_INTERVALS = {
    "price_data": 300,     # 5 minutes
    "news_data": 1800,     # 30 minutes
    "sentiment_data": 1800 # 30 minutes
}

# Number of coins to track
TOP_COINS_LIMIT = 100

# Gradio settings
GRADIO_SERVER_PORT = 7860
AUTO_REFRESH_INTERVAL = 30  # seconds

# Cache settings
CACHE_TTL = 300  # 5 minutes
CACHE_MAX_SIZE = 1000

# Logging
LOG_LEVEL = "INFO"
LOG_FILE = "logs/crypto_aggregator.log"
```

## API Usage Examples

### Collect Data Manually
```python
from collectors import collect_price_data, collect_news_data

# Collect latest prices
success, count = collect_price_data()
print(f"Collected {count} prices")

# Collect news
count = collect_news_data()
print(f"Collected {count} articles")
```

### Query Database
```python
from database import get_database

db = get_database()

# Get latest prices
prices = db.get_latest_prices(limit=10)

# Get news by coin
news = db.get_news_by_coin("bitcoin", limit=5)

# Get top gainers
gainers = db.get_top_gainers(limit=10)
```

### AI Analysis
```python
from ai_models import analyze_sentiment, analyze_market_trend
from database import get_database

# Analyze sentiment
result = analyze_sentiment("Bitcoin hits new all-time high!")
print(result)  # {'label': 'positive', 'score': 0.95, 'confidence': 0.92}

# Analyze market trend
db = get_database()
history = db.get_price_history("bitcoin", hours=168)
analysis = analyze_market_trend(history)
print(analysis)  # {'trend': 'Bullish', 'support_level': 50000, ...}
```

## Error Handling & Resilience

### Fallback Mechanisms
- If CoinGecko fails β†’ CoinCap is used
- If both APIs fail β†’ cached database data is used
- If AI models fail to load β†’ keyword-based sentiment analysis
- All network requests have timeout and retry logic

### Data Validation
- Price bounds checking (MIN_PRICE to MAX_PRICE)
- Volume and market cap validation
- Duplicate prevention (unique URLs for news)
- SQL injection prevention (read-only queries only)

### Logging
All operations are logged to `logs/crypto_aggregator.log`:
- Info: Successful operations, data collection
- Warning: API failures, retries
- Error: Database errors, critical failures

## Performance Optimization

- **Async/Await**: All network requests use aiohttp
- **Connection Pooling**: Reused HTTP connections
- **Caching**: In-memory cache with 5-minute TTL
- **Batch Inserts**: Minimum 100 records per database insert
- **Indexed Queries**: Database indexes on symbol, timestamp, sentiment
- **Lazy Loading**: AI models load only when first used

## Troubleshooting

### Issue: Models won't load
**Solution**: Ensure you have 4GB+ RAM. Models download on first run (2-3 min).

### Issue: No data appearing
**Solution**: Wait 5 minutes for initial data collection, or click "Refresh" buttons.

### Issue: Port 7860 already in use
**Solution**: Change `GRADIO_SERVER_PORT` in `config.py` or kill existing process.

### Issue: Database locked
**Solution**: Only one process can write at a time. Close other instances.

### Issue: RSS feeds failing
**Solution**: Some feeds may be temporarily down. Check Tab 6 for status.

## Development

### Running Tests
```bash
# Test data collection
python collectors.py

# Test AI models
python ai_models.py

# Test utilities
python utils.py

# Test database
python database.py
```

### Adding New Data Sources

Edit `collectors.py`:
```python
def collect_new_source():
    try:
        response = safe_api_call("https://api.example.com/data")
        # Parse and save data
        return True
    except Exception as e:
        logger.error(f"Error: {e}")
        return False
```

Add to scheduler in `collectors.py`:
```python
# In schedule_data_collection()
threading.Timer(interval, collect_new_source).start()
```

## Validation Checklist

- [x] All 8 files complete
- [x] No TODO or FIXME comments
- [x] No placeholder functions
- [x] All imports in requirements.txt
- [x] Database schema matches specification
- [x] All 6 Gradio tabs implemented
- [x] All 3 AI models integrated
- [x] All 5+ data sources configured
- [x] Error handling in every network call
- [x] Logging for all major operations
- [x] No API keys in code
- [x] Comments in English
- [x] PEP 8 compliant

## License

MIT License - Free to use, modify, and distribute.

## Support

For issues or questions:
- Check logs: `logs/crypto_aggregator.log`
- Review error messages in Tab 6
- Ensure all dependencies installed: `pip install -r requirements.txt`

## Credits

- **Data Sources**: CoinGecko, CoinCap, Binance, Alternative.me, CoinDesk, Cointelegraph, Reddit
- **AI Models**: HuggingFace (Cardiff NLP, ProsusAI, Facebook)
- **Framework**: Gradio

---

**Made with ❀️ for the Crypto Community**