简体   繁体   English

为多个 CSV 文件导出 Pandas output

[英]Exporting Pandas output for multiple CSV files

I have many CSV files under subdirectories in one folder.我在一个文件夹的子目录下有许多 CSV 文件。 They all contain tweets and other metadata.它们都包含推文和其他元数据。 I am interested in removing most of these metadata and keeping the tweets themselves and their time.我有兴趣删除大部分这些元数据并保留推文本身和时间。 I used glob to read the files, and the removing part seems to be working fine.我使用 glob 读取文件,删除部分似乎工作正常。 However, I am not sure how to save the output so that all files are saved and with their original file name.但是,我不确定如何保存 output 以便保存所有文件并使用其原始文件名。

import pandas as pd
import glob
path = r'D:\tweets'
myfiles= glob.glob(r'D:\tweets\**\*.csv', recursive=True)
for f in myfiles:
    df = pd.read_csv(f)
df = df.drop(["name", "id","conversation_id","created_at","date"], axis=1)
df = df[df["language"].str.contains("bn|ca|ckbu|id||zh")==False]
df.to_csv("output_filename.csv", index=False, encoding='utf8')

If you do it this way, it will overwrite the same file:如果你这样做,它会覆盖同一个文件:

for f in myfiles:
    df = pd.read_csv(f)
    df = df.drop(["name", "id","conversation_id","created_at","date"], axis=1)
    df = df[df["language"].str.contains("bn|ca|ckbu|id||zh")==False]
    df.to_csv(f, index=False, encoding='utf8')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM