简体   繁体   中英

Extracting a .7z File into a Pandas Data Frame

I am Using a Jupyter notebook (google colab) to try and extract data from a.7z file into a pandas dataframe, using linux commands. The data is from http://untroubled.org/spam/ . I wish to extract only the data from the 2020-01.7z file. so far I have,

!wget http://untroubled.org/spam/2020-01.7z
!7z x 2020-01.7z
import pandas as pd
import py7zr     
archive = py7zr.SevenZipFile('2020-01.7z', mode='r')
archive.extractall(path="/tmp")
with open ('2020-01.7z', 'r') as myfile:
  myfile.read()

mydf = pd.DataFrame(myfile)
 


UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 2: invalid 
start byte

I'm not really sure what the "/tmp" mean. I know there is a way to do this I just don't have enough understanding yet of these commands and how to use them. Any help is appreciated

Just try

!7z e 2020-01.7z

it works for me!

You can see this

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM