简体   繁体   中英

Python - AttributeError: 'tuple' object has no attribute 'rstrip' while extracting tar.gz archive

Here is my code:

import pandas as pd
import numpy as np
import glob
import os, tarfile
import re
from datetime import date

file = tarfile.open(r'path_to_archive')

df = pd.DataFrame()
path = r'path_to_folder'

for file_name in glob.glob(path+'*.csv'):
    x = pd.read_csv(file_name, header=None)
    df = pd.concat([df,x],axis=0)
df = df.drop_duplicates()
df.columns = ["ID"]
df["ID"] = df["ID"].astype(str)
df["ID"] = "OUTPUT/DATA" + df["ID"] + ".xml"
df["ID"] = df["D"].apply(lambda x: x.zfill(8))

for row in df.iterrows():
    member = file.getmember(row)
    member.name = os.path.basename(member.name)
    file.extract(member,r'output_folder')

So, basically, in output folders i already have xml files named with 8 symbols (eg 0000131). The archive is too big (6.4 gb), so I want to update the folder and have list of changes (df).

When I execute this code, I recieve:

AttributeError: 'tuple' object has no attribute 'rstrip'

In

member = file.getmember(row)

Could you, please, help me with it?

df.iterrows() returns a tuple, the first element is current index and second item is current row as Series. According to TarFile.getmember(name) , I think you need

for idx, row in df.iterrows():
    member = file.getmember(row['ID'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM