如何将pickle文件的文件夹转换为单个csv文件

Question

I have a directory containing about 1700 pickle file, that every file is all Twitter post of the user, I want to convert it into a folder of CSV files, that every CSV file name is the name of the pickle file and each row contains one tweet of user... after that, I want just the top 20 CSV with more samples than others... how can I do that?我有一个包含大约 1700 个 pickle 文件的目录，每个文件都是用户的 Twitter 帖子，我想将其转换为 CSV 文件的文件夹，每个 CSV 文件名都是 pickle 文件的名称，每行包含一个用户的推文……在那之后，我只想要前 20 个 CSV 的样本比其他人多……我该怎么做？

# khabarlist = open_file_linebyline(pkl_path)
def open_dir_in_dict(input_path):
    files = os.scandir(input_path)
    my_dict = {}
    for file in files:
        # if len(file.name.split()) > 1:
        #     continue
        # if file.split('.')[-1] != "pkl":

        with open(file, 'r', encoding='utf8') as f:
            items = [i.strip() for i in f.read().split(",")]
        my_dict[file.replace(".pkl", "")] = items
        df = pd.DataFrame(my_dict)
        df.to_excel(file.replace(".pkl", "") + "xlsx")


open_dir_in_dict("Raw/")

I Wrote the sample code for it and it did not work...我为它编写了示例代码，但它不起作用......

Answer 1

def open_dir_in_dict(input_path):
    files = os.scandir(input_path)
    my_dict = {}
    for file in files:
        if len(file.name.split()) > 1:
            continue
        if file.split('.')[-1] != "pkl":

            with open(file, 'r', encoding='utf-8', errors='replace') as f:
                print(f.readlines())
                items = [i.strip() for i in f.read().split(",")]  # encode('utf-8').strip()
        my_dict[file.replace(".pkl", "")] = items
        df = pd.DataFrame(my_dict)
        df.to_excel(file.replace(".pkl", "") + "xlsx")


# open_dir_in_dict("Raw/")

Answer 2

and a better answer...和一个更好的答案......

import os
import pandas as pd
import regex as re

data_path = "/content/drive/My Drive/twint/Data/pkl/Data/"
for path in os.listdir(data_path):
    my_tweets = []
    df = pd.read_pickle(data_path + path)
    for tweet in df.tweet:
        url = re.findall(r"http\S+", tweet)
        if url == []:
            my_tweets.append(tweet)
    new_df = pd.DataFrame({"tweets": my_tweets, "author": path.replace(".pkl", "")})  # path[:-4]
    new_df.to_csv("/content/drive/My Drive/twint/final.csv", index=False, mode="a", )

如何将pickle文件的文件夹转换为单个csv文件

问题描述

2 个解决方案

解决方案1
0 2020-10-30 08:16:04

解决方案2
0 2020-10-30 08:52:01

如何将pickle文件的文件夹转换为单个csv文件

问题描述

2 个解决方案

解决方案1 0 2020-10-30 08:16:04

解决方案2 0 2020-10-30 08:52:01

解决方案1
0 2020-10-30 08:16:04

解决方案2
0 2020-10-30 08:52:01