简体   繁体   中英

How to load a zip file with pyscript and save into the virtual file system

I am trying to load a zip file and save it in the virtual file system for further processing with pyscript. In this example, I aim to open it and list its content.

As far as I got:

See the self standing html code below, adapted fromtutorials (with thanks to the author, btw)

It is able to load Pyscript, lets the user select a file and loads it (although not in the right format it seems). It creates a dummy zip file and saves it to the virtual file, and list the content. All this works upfront and also if I point the process_file function to that dummy zip file, it indeed opens and lists it.

The part that is NOT working is when I select via the button/file selector any valid zip file in the local file system, when loading the data into data it is text (utf-8) and I get this error:

File "/lib/python3.10/zipfile.py", line 1353, in _RealGetContents
    raise BadZipFile("Bad magic number for central directory")
zipfile.BadZipFile: Bad magic number for central directory

I have tried saving to a file and loading it, instead of using BytesIO, also tried variations of using ArrayBuffer or Stream from here I have also tried creating a FileReader and using readAsBinaryString() or readAsText() and various transformations, with same result: either it fails to recognise the "magic number" or I get "not a zip file". When feeding some streams or arrayBuffer I get variations of:

 TypeError: a bytes-like object is required, not 'pyodide.JsProxy' 

At this point I suspect there is something embarrassingly obvious that yet I am unable to see, so, any fresh pair of eyes and advice on how best/simply load a file is much appreciated:) Many thanks in advance.

<!DOCTYPE html>
<html>

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">
    <link rel="stylesheet" href="https://pyscript.net/alpha/pyscript.css" />
    <script defer src="https://pyscript.net/alpha/pyscript.js"></script>
    <title>Example</title>
</head>

<body>

    <p>Example</p>
    <br />
    <label for="myfile">Select a file:</label>
    <input type="file" id="myfile" name="myfile">
    <br />
    <br />
    <div id="print_output"></div>
    <br />
    <p>File Content:</p>
    <div style="border:2px inset #AAA;cursor:text;height:120px;overflow:auto;width:600px; resize:both">
        <div id="content">
        </div>
    </div>

    <py-script output="print_output">
        import asyncio
        import zipfile
        from js import document, FileReader
        from pyodide import create_proxy
        import io

        async def process_file(event):
            fileList = event.target.files.to_py()
            for f in fileList:
                data= await f.text()
                mf=io.BytesIO(bytes(data,'utf-8'))

            with zipfile.ZipFile(mf,"r") as zf:
                nl=zf.namelist()
                nlf=" _ ".join(nl)
                document.getElementById("content").innerHTML=nlf

        def main():
            # Create a Python proxy for the callback function
            # process_file() is your function to process events from FileReader
            file_event = create_proxy(process_file)
            # Set the listener to the callback
            e = document.getElementById("myfile")
            e.addEventListener("change", file_event, False)

            mf = io.BytesIO()
            with zipfile.ZipFile(mf, mode="w",compression=zipfile.ZIP_DEFLATED) as zf:
                zf.writestr('file1.txt', b"hi")
                zf.writestr('file2.txt', str.encode("hi"))
                zf.writestr('file3.txt', str.encode("hi",'utf-8'))  
            with open("a.txt.zip", "wb") as f: # use `wb` mode
                f.write(mf.getvalue())
            
            with zipfile.ZipFile("a.txt.zip", "r") as zf:
                nl=zf.namelist()
                nlf=" ".join(nl)

            document.getElementById("content").innerHTML = nlf


        main()
    </py-script>

</body>

</html>

You were very close with your code. The problem was in converting the file data to the correct data type. The requirement is to convert the arrayBuffer to Uint8Array and then to a bytearray .

Import the required function:

from js import Uint8Array

Read the file data into an arrayBuffer and copy it to a new Uint8Array

data = Uint8Array.new(await f.arrayBuffer())

Convert the Uint8Array to a bytearray that BytesIO expects

mf = io.BytesIO(bytearray(data))

For reference, based on John Hanley's response (thanks again,), here is the working code: adding a demonstration of saving as binary in the virtual file systems and loading it from that file:

<!DOCTYPE html>
<html>

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">
    <link rel="stylesheet" href="https://pyscript.net/alpha/pyscript.css" />
    <script defer src="https://pyscript.net/alpha/pyscript.js"></script>
    <title>File Example</title>
</head>

<body>

    <p>Example</p>
    <br />
    <label for="myfile">Select a file:</label>
    <input type="file" id="myfile" name="myfile">
    <br />
    <br />
    <div id="print_output"></div>
    <br />
    <p>File Content:</p>
    <div style="border:2px inset #AAA;cursor:text;height:120px;overflow:auto;width:600px; resize:both">
        <div id="content">
        </div>
    </div>

    <py-script output="print_output">
        import asyncio
        import zipfile
        from js import document, FileReader, Uint8Array
        from pyodide import create_proxy
        import io

        async def process_file(event):
            fileList = event.target.files.to_py()
            for f in fileList:
                data = Uint8Array.new(await f.arrayBuffer())
                mf = io.BytesIO(bytearray(data))
                with zipfile.ZipFile(mf,"r") as zf:
                    nl=zf.namelist()
                    nlf=" ".join(nl)
                    document.getElementById("content").innerText+= "\n Test 2: reading file from local file system: "+f.name+" content:"+nlf
                with open("b.zip","wb") as outb:
                    outb.write(bytearray(data))
                with zipfile.ZipFile("b.zip", "r") as zf:
                    nl=zf.namelist()
                    nlf=" ".join(nl)
                document.getElementById("content").innerText += "\n Test 3: reading the same file but first save it in virtual fs and read it: " + nlf
    
    

        def main():
            # Create a Python proxy for the callback function
            # process_file() is your function to process events from FileReader
            file_event = create_proxy(process_file)
            # Set the listener to the callback
            e = document.getElementById("myfile")
            e.addEventListener("change", file_event, False)

            mf = io.BytesIO()
            with zipfile.ZipFile(mf, mode="w",compression=zipfile.ZIP_DEFLATED) as zf:
                zf.writestr('file1.txt', b"hi")
                zf.writestr('file2.txt', str.encode("hi"))
                zf.writestr('file3.txt', str.encode("hi",'utf-8'))  
            with open("a.zip", "wb") as f: # use `wb` mode
                f.write(mf.getvalue())
            
            with zipfile.ZipFile("a.zip", "r") as zf:
                nl=zf.namelist()
                nlf=" ".join(nl)

            document.getElementById("content").innerText = "Test 1: reading a dummy zip from virtual file system: " + nlf


        main()
    </py-script>

</body>

</html>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM