Getting rid of serendipity_spamblocklog
This Serendipity installation has a Spamblock feature which is currently logging comment spam to a database. Over time, this database grew quite a bit and while I'd like to keep the data around I don't need the data right away. Since the machine the database is running on is low on memory anyway, I wanted to archive & then purge old records from the spamblock logging table.
This is where we are:
$ ls -lhgo serendipity_spamblocklog* -rw-rw----. 1 8.7K Nov 21 20:28 serendipity_spamblocklog.frm -rw-rw----. 1 1.2G Jan 25 07:29 serendipity_spamblocklog.MYD -rw-rw----. 1 41M Jan 25 07:39 serendipity_spamblocklog.MYI $ for a in {2008..2015}; do printf "year: $a " mysql -B -N -D s9y -e "select count(*) from serendipity_spamblocklog \ where year(from_unixtime(timestamp)) = $a;" done year: 2008 12 year: 2009 14901 year: 2010 93232 year: 2011 12332 year: 2012 4373 year: 2013 245002 year: 2014 1232742 year: 2015 131898Yeah, 2014 was really the year-of-the-spam :-)
Export those into CSV files:
$ for a in {2008..2014}; do echo "year: $a" mysql -D s9y -e "select * from serendipity_spamblocklog \ where year(from_unixtime(timestamp)) = \"$a\" into outfile \ \"spamblocklog-$a.csv\" fields terminated by ',' enclosed by '\"' \ lines terminated by '\n';" done year: 2008 year: 2009 year: 2010 year: 2011 year: 2012 year: 2013 year: 2014Which gives us (in datadir):
$ ls -hgo total 1.2G -r--------. 1 4.7K Jan 25 07:07 serendipity_spamblocklog-2008.csv -r--------. 1 4.2M Jan 25 07:07 serendipity_spamblocklog-2009.csv -r--------. 1 91M Jan 25 07:08 serendipity_spamblocklog-2010.csv -r--------. 1 5.4M Jan 25 07:08 serendipity_spamblocklog-2011.csv -r--------. 1 6.4M Jan 25 07:09 serendipity_spamblocklog-2012.csv -r--------. 1 146M Jan 25 07:09 serendipity_spamblocklog-2013.csv -r--------. 1 860M Jan 25 07:10 serendipity_spamblocklog-2014.csvTo count records, we can't use "wc -l" just like that because comments may contain newlines as well - so let's count timestamps instead:
$ grep -c '^\"1' * serendipity_spamblocklog-2008.csv:12 serendipity_spamblocklog-2009.csv:14901 serendipity_spamblocklog-2010.csv:93232 serendipity_spamblocklog-2011.csv:12332 serendipity_spamblocklog-2012.csv:4373 serendipity_spamblocklog-2013.csv:245002 serendipity_spamblocklog-2014.csv:1232742Delete the exported records:
$ for a in {2008..2014}; do echo "year: $a" mysql -D s9y -e "delete from serendipity_spamblocklog \ where year(from_unixtime(timestamp)) = $a;" doneThe size of the database file may not decrease until after we run OPTIMIZE TABLE on the table:
$ mysqlcheck --optimize s9y serendipity_spamblocklog $ ls -lhgo serendipity_spamblocklog.* -rw-rw----. 1 8.7K Nov 21 20:28 serendipity_spamblocklog.frm -rw-rw----. 1 88M Jan 25 08:15 serendipity_spamblocklog.MYD -rw-rw----. 1 3.2M Jan 25 08:15 serendipity_spamblocklog.MYIAnd we can still run some stats on the CSV files:
$ awk -F\",\" '/^\"1/ {print $2}' serendipity_spamblocklog-2013.csv | sort | uniq -c | sort -n 30 "API_ERROR" 119 "moderate" 98045 "REJECTED" 146808 "MODERATE" $ awk -F\",\" '/^\"1/ {print $8}' serendipity_spamblocklog-2014.csv | awk '{print $1}' | sort | uniq -c | sort -n | tail 251 PHP/5.3.89 252 PHP/5.3.59 252 PHP/5.3.94 261 PHP/5.2.19 270 PHP/5.2.62 1125 Opera/9.80 4595 30848 PHP/5.2.10 509019 Mozilla/4.0 646256 Mozilla/5.0