简体   繁体   中英

CSV file update using Shell script

I am quite new with these things and would really need some help with this.

Am trying to make a shell script that will extract data from one or multiple databases, export it to CSV, have that data merged into one file, and apply some formulas to the file like SUM or to check the difference between the numbers. I should be able to update or replace the file as long as the formulas will still get applied to the new file.

What I got so far:

mysql -h host -u user -ppassword -P port 
"query" |tee file1.csv
# I didn't know how to have multiple queries for the same DB
mysql -h host2 -u user2 -ppassword2 -P port 
"query2" |tee file2.csv

sed -i 'li\FILE1' file1.csv #just to add a title
echo '' >> file1.csv #just to add a space at the end
sed -i 'li\FILE2' file2.csv 
echo '' >> file2.csv 
cat file1.csv file2.csv > file.csv

The is an example of how my file.csv looks like but in fact contains more similar cells:

       A         B       C
1   C.Installs      
2   date        
3   2019-02-01  100 
4   2019-02-02  131 
5   2019-02-03  222 
6   2019-02-04  180 
7   2019-02-05  213 
8           
9   A.Installs      
10  Date        
11  2019-02-01  23  
12  2019-02-02  42  
13  2019-02-03  34  
14  2019-02-04  35  
15  2019-02-05  21  

Now everytime I run the shell command it should update/replace the file.csv while maintaining or re-adding the formulas for the specific cells. An example for BEFORE and AFTER:

First run of the shell script:

         A       B      C
1   C.Installs      
2   date        
3   2019-02-01  100 
4   2019-02-02  131 
5   2019-02-03  222 
6   2019-02-04  180 
7   2019-02-05  213 
8               846 #Formula of SUM for the 5 values
9   A.Installs      
10  Date        
11  2019-02-01  23  
12  2019-02-02  42  
13  2019-02-03  34  
14  2019-02-04  35  
15  2019-02-05  21  
16              155 #Formula of SUM for the 5 values
17          
18              691 #Formula of the difference between the two totals

Second run of the Shell script:

        A        B     C
1   C.Installs      
2   date        
3   2019-02-02  131 
4   2019-02-03  222 
5   2019-02-04  180 
6   2019-02-05  213 
7   2019-02-06  158 
8               904 #Formula of SUM for the 5 values
9   A.Installs      
10  Date        
11  2019-02-02  42  
12  2019-02-03  34  
13  2019-02-04  35  
14  2019-02-05  21  
15  2019-02-06  31  
16              163 #Formula of SUM for the 5 values
17          
18              741 #Formula of the difference between the two totals

So I would think that first step is to find a way to apply the formulas to the csv file

So I need to build on top of what I have, maybe something with awk am not sure how to proceed, to be honest totally new at this.

Please keep it simple.

Thanks

You could use csvkit https://csvkit.readthedocs.io/en/latest/scripts/csvsql.html

Starting from

$ cat one.csv
2019-02-01,100
2019-02-02,131
2019-02-03,222
2019-02-04,180
2019-02-05,213

$ cat two.csv
2019-02-01,23
2019-02-02,42
2019-02-03,34
2019-02-04,35
2019-02-05,21

you could run

#!/bin/bash

# add header
sed -i  '1s/^/data,value\n/' one.csv
sed -i  '1s/^/data,value\n/' two.csv

one=$(csvsql --query "select sum(value) as sumOne from one" one.csv | tail -n +2)

two=$(csvsql --query "select sum(value) as sumOne from two" two.csv | tail -n +2)

echo "$one-$two" | bc

to have 691

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM