简体   繁体   English

从多个数据帧 Pandas 构建 DataFrame 值

[英]Build DataFrame values from multiple dataframes Pandas

I am trying to build a dataframe where the data is grabbed from multiple files.我正在尝试构建一个 dataframe ,其中数据是从多个文件中获取的。 I have created an empty dataframe with the desired shape, but I am having trouble grabbing the data.我创建了一个具有所需形状的空 dataframe,但我无法获取数据。 I found this but when I concat, I am still getting NaN values.我发现了这一点,但是当我连接时,我仍然得到 NaN 值。 Edit2: I changed the order of df creation and put concat inside the for loop and same result. Edit2:我更改了df创建的顺序并将 concat 放在 for 循环中,结果相同。 (for obvious reasons) (出于显而易见的原因)

import pandas as pd
import os
import glob

def daily_country_framer():
    # create assignments
    country_source = r"C:\Users\USER\PycharmProjects\Corona Stats\Country Series"
    list_of_files = glob.glob(country_source + r"\*.csv")
    latest_file = max(list_of_files, key=os.path.getctime)
    last_frame = pd.read_csv(latest_file)
    date_list = []
    label_list = []

    # build date_list values
    for file in os.listdir(country_source):
        file = file.replace('.csv', '')
        date_list.append(file)

    # build country_list values
    for country in last_frame['Country']:
        label_list.append(country)

    # create dataframe for each file in folder
    for filename in os.listdir(country_source):
        filepath = os.path.join(country_source, filename)
        if not os.path.isfile(filepath):
            continue
        df1 = pd.read_csv(filepath)
    df = pd.DataFrame(index=label_list, columns=date_list)
    df1 = pd.concat([df])
    print(df1)


daily_country_framer()

Two sample dataframes: (notice the different shapes)两个示例数据框:(注意不同的形状)

                Country  Confirmed  Deaths  Recovered
0                 World    1595350   95455     353975
1           Afghanistan        484      15         32
2               Albania        409      23        165
3               Algeria       1666     235        347
4               Andorra        583      25         58
..                  ...        ...     ...        ...
180             Vietnam        255       0        128
181  West Bank and Gaza        263       1         44
182      Western Sahara          4       0          0
183              Zambia         39       1         24
184            Zimbabwe         11       3          0

[185 rows x 4 columns]
                Country  Confirmed  Deaths  Recovered
0                 World    1691719  102525     376096
1           Afghanistan        521      15         32
2               Albania        416      23        182
3               Algeria       1761     256        405
4               Andorra        601      26         71
..                  ...        ...     ...        ...
181  West Bank and Gaza        267       2         45
182      Western Sahara          4       0          0
183               Yemen          1       0          0
184              Zambia         40       2         25
185            Zimbabwe         13       3          0

[186 rows x 4 columns]

Current output:当前 output:

                   01-22-2020 01-23-2020  ... 04-09-2020 04-10-2020
World                     NaN        NaN  ...        NaN        NaN
Afghanistan               NaN        NaN  ...        NaN        NaN
Albania                   NaN        NaN  ...        NaN        NaN
Algeria                   NaN        NaN  ...        NaN        NaN
Andorra                   NaN        NaN  ...        NaN        NaN
...                       ...        ...  ...        ...        ...
West Bank and Gaza        NaN        NaN  ...        NaN        NaN
Western Sahara            NaN        NaN  ...        NaN        NaN
Yemen                     NaN        NaN  ...        NaN        NaN
Zambia                    NaN        NaN  ...        NaN        NaN
Zimbabwe                  NaN        NaN  ...        NaN        NaN

[186 rows x 80 columns]

Desired output: (where NaN equals corresponding values from target column or a list of all columns ie: if ['Confirmed'] then 0,1,2,3,4, if all then [0,0,0],[1,0,0],[2,0,0])所需的 output:(其中 NaN 等于目标列或所有列的列表中的相应值,即:如果 ['Confirmed'] 则 0,1,2,3,4,如果全部则 [0,0,0],[1 ,0,0],[2,0,0])

Your code (with comments inline):您的代码(内联注释):

import pandas as pd
import os
import glob

def daily_country_framer():
    # create assignments
    country_source = r"C:\Users\USER\PycharmProjects\Corona Stats\Country Series"
    list_of_files = glob.glob(country_source + r"\*.csv")
    latest_file = max(list_of_files, key=os.path.getctime)
    last_frame = pd.read_csv(latest_file)
    date_list = []
    label_list = []

    # build date_list values
    for file in os.listdir(country_source):
        file = file.replace('.csv', '')
        date_list.append(file)

    # build country_list values
    for country in last_frame['Country']: # == last_frame['Country'].tolist()
        label_list.append(country)

    # create dataframe for each file in folder
    for filename in os.listdir(country_source):
        filepath = os.path.join(country_source, filename)
        if not os.path.isfile(filepath):
            continue
        df1 = pd.read_csv(filepath)
        # you redefine df1 for every file in the loop. So if there
        # are 10 files, only the last one is actually used anywhere
        # outside this loop.
    df = pd.DataFrame(index=label_list, columns=date_list)
    df1 = pd.concat([df])
    # here you just redefined df1 again as the concatenation of the
    # empty dataframe you just created in the line above.
    print(df1)


daily_country_framer()

So hopefully that illuminates why you were getting the results you were getting.所以希望这能说明为什么你会得到你得到的结果。 It was doing exactly what you asked it to do.它正在做你要求它做的事情。

What you want to do is get a dictionary with dates as keys and the associated dataframe as values, then concatenate that.您想要做的是获取一个字典,其中日期作为键,关联的 dataframe 作为值,然后将其连接起来。 This can be quite expensive because of some quirks with how pandas does concatenation, but if you concatenate along axis=0, you should be fine.这可能会非常昂贵,因为 pandas 进行连接的方式存在一些怪癖,但如果你沿着轴 = 0 连接,你应该没问题。

A better way might be the following:更好的方法可能如下:

import pandas as pd
import os


def daily_country_framer(country_source):
    accumulator = {}
    # build date_list values
    for filename in os.listdir(country_source):
        date = filename.replace('.csv', '')
        filepath = os.path.join(country_source, filename)
        accumulator[date] = pd.read_csv(filepath)
    # now we have a dictionary of {date : data} -- perfect!
    df = pd.concat(accumulator)
    return df


daily_country_framer("C:\Users\USER\PycharmProjects\Corona Stats\Country Series")

Does that work?那样有用吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas:在 dataframe 中根据一列中的相似值填充来自多个数据帧的值的空列 - pandas: populate an empty column in a dataframe with values from multiple dataframes based on similar values in one column 将具有未排序索引的多个熊猫数据框中的值插入到另一个数据框中的现有列 - Insert values from multiple pandas dataframes with unsorted index to existing column in another dataframe 创建具有多个数据框的熊猫数据框 - Create pandas dataframe with multiple dataframes 将pandas数据框拆分为多个数据框 - Split pandas dataframe into multiple dataframes 使用不同的DataFrame更改熊猫DataFrame切片的值 - Altering values from a pandas DataFrame slice using different DataFrames 用来自其他数据帧的值有效地替换 Pandas 数据帧 - Efficiently replacing Pandas dataframe with values from other dataframes 如何根据另外两个数据帧的值填充 Pandas 数据帧 - How to fill the Pandas Dataframe based on values from another two dataframes 如何从单个 dataframe 切片和创建多个 pandas 数据帧 - how to slice and create multiple pandas dataframes from a singe dataframe 如何从多个 DataFrame 更新 pandas DataFrame 的计数列? - How to update the count column of a pandas DataFrame from multiple DataFrames? 从多个 OHLCV 数据帧创建单个 Pandas 数据帧 - Create a single pandas dataframe from multiple OHLCV dataframes
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM