On linux , using a bash script how do I rename an Excel file to include the row count at the end of the existing filename

Question

First post so be gentle please.

I have a bash script running on a Linux server which does a daily sftp download of an Excel file. The file is moved to a Windows share. An additional requirement has arisen in that i'd like to add the number of rows to the filename which is also timestamped so different each day. Ideally at the end before the xlsx extension. After doing some research it would seem I may be able to do it all in the same script if I use Python and one of the Excel modules. I'm a complete noob in Python but i have done some experimenting and have some working code using the Pandas module. Here's what i have working in a test spreadsheet with a worksheet named mysheet and counting a column named code.

>>> excel_file = pd.ExcelFile('B:\PythonTest.xlsx')
>>> df = excel_file.parse('mysheet')
>>> df[['code']].count()
code    10
dtype: int64

>>> mycount = df[['code']].count()
>>> print(mycount)
code    10
dtype: int64
>>>

I have 2 questions please. First how do I pass todays filename into the python script to then do the count on and how do i return this to bash. Also how do i just return the count value eg 10 in the above example. i dont want column name or dtype passed back.

Thanks in advance.

Answer 1

Assuming we put your python into a separate script file, something like:

# count_script.py
import sys
import pandas as pd

excel_file = pd.ExcelFile(sys.argv[1])
df = excel_file.parse('mysheet')
print(df[['code']].count().at(0))

We could then easily call that script from within the bash script that invoked it in the first place (the one that downloads the file).

TODAYS_FILE="PythonTest.xlsx"

# ...
# Download the file
# ...

# Pass the file into your python script (manipulate the file name to include 
# the correct path first, if necessary).
# By printing the output in the python script, the bash subshell (invoking a 
# command inside the $(...) will slurp up the output and store it in the COUNT variable.
COUNT=$(python count_script.py "${TODAYS_FILE}")

# this performs a find/replace on $TODAYS_FILE, replacing the ending ".xlsx" with an
# underscore, then the count obtained via pandas, then tacks on a ".xlsx" again at the end.
NEW_FILENAME="${TODAYS_FILE/\.xlsx/_$COUNT}.xlsx"

# Then rename it
mv "${TODAYS_FILE}" "${NEW_FILENAME}"

Answer 2

You can pass command-line arguments to python programs, by invoking them as such:

python3 script.py argument1 argument2 ... argumentn

They can then be accessed within the script using sys.argv . You must import sys before using it. sys.argv[0] is the name of the python script, and the rest are the additional command-line arguments.

Alternatively you may pass it in stdin, which can be read in Python using normal standard input functions like input(). To pass input in stdin, in bash do this:

echo $data_to_pass | python3 script.py

To give output you can write to stdout using print(). Then redirect output in bash, to say, a file:

echo $data_to_pass | python3 script.py > output.txt

To get the count value within Python, you simply need to add .at(0) at the end to get the first value; that is:

df[["code"]].count().at(0)

You can then print() it to send it to bash.

On linux , using a bash script how do I rename an Excel file to include the row count at the end of the existing filename

Question

2 answers

solution1
1 2020-10-07 20:17:25

solution2
0 2020-10-07 20:06:52

On linux , using a bash script how do I rename an Excel file to include the row count at the end of the existing filename

Question

2 answers

solution1 1 2020-10-07 20:17:25

solution2 0 2020-10-07 20:06:52

solution1
1 2020-10-07 20:17:25

solution2
0 2020-10-07 20:06:52