繁体   English   中英

如何从相当于 R 的 Python 数据帧列表中选择特定数据帧?

[英]How to select a particular dataframe from a list of dataframes in Python equivalent to R?

我在 R 中有一个数据框列表,我试图用它来选择一个特定的数据框,如下所示:
x = listOfdf$df1$df2$df3
现在,努力在 Python 中找到一种等效的方法。 比如,关于如何从 Pandas Python 中的数据帧列表中选择特定数据帧的语法。

找到了从数据帧列表中选择特定数据帧/数据帧_列的解决方案。
在 R 中: x = listOfdf$df1$df2$df3在 Python 中: x = listOfdf['df1']['df2']['df3']

谢谢:)

我看到你已经回答了你自己的问题,这很酷 然而,正如 jezrael 在他的评论中所暗示的,你真的应该考虑使用字典。 来自 R 听起来可能有点可怕(我自己也有过,现在我在大多数方面都更喜欢 Python),但它值得你付出努力。

首先,字典是一种将值或变量映射到键(如名称)的方法。 您使用大括号 { } 来构建字典,并使用方括号 [ ] 对其进行索引。

假设您有两个这样的数据框:

np.random.seed(123)
# Reproducible input - Dataframe 1
rows = 10
df_1 = pd.DataFrame(np.random.randint(90,110,size=(rows, 2)), columns=list('AB'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_1['dates'] = datelist 
df_1 = df_1.set_index(['dates'])
df_1.index = pd.to_datetime(df_1.index)

##%%

# Reproducible input - Dataframe 2
rows = 10
df_2 = pd.DataFrame(np.random.randint(10,20,size=(rows, 2)), columns=list('CD'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_2['dates'] = datelist 
df_2 = df_2.set_index(['dates'])
df_2.index = pd.to_datetime(df_2.index)

在此处输入图片说明

使用有限数量的数据框,您可以通过以下方式轻松地在字典中组织它们:

myFrames = {'df_1': df_1,
            'df_2': df_2} 

现在您有了对数据框的引用,以及您自己定义的名称或键。 您会在此处找到更详细的解释。

以下是您如何使用它:

print(myFrames['df_1'])

在此处输入图片说明


您还可以使用该引用对您的数据框之一进行更改,并将其添加到您的字典中:

df_3 = myFrames['df_1']
df_3 = df_3*10
myFrames.update({'df_3': df_3})
print(myFrames)

在此处输入图片说明


现在假设您有一大堆想要以相同方式组织的数据框。 您可以列出所有可用数据框的名称,如下所述。 但是,您应该知道,出于多种原因,通常不建议使用eval()

无论如何,我们开始吧:首先,您将获得所有数据帧名称的字符串列表, 如下所示

alldfs = [var for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)]

如果你同时有很多事情要做,你很可能不会对所有这些都感兴趣。 因此,可以说所有您特别感兴趣的数据帧的名称都以“df_”开头。 您可以像这样隔离它们:

dfNames = []
for elem in alldfs:
   if str(elem)[:3] == 'df_':
       dfNames.append(elem)

现在您可以将该列表与eval()结合使用来制作字典:

myFrames2 = {}
for dfName in dfNames:
    myFrames2[dfName] = eval(dfName)

现在,您可以遍历该字典并对每个字典执行一些操作。 例如,您可以将每个数据帧的最后一列乘以 10,然后使用这些值创建一个新的数据帧:

j = 1
for key in myFrames.keys():

    # Build new column names for your brand new df
    colName = []
    colName.append('column_' + str(j))

    if j == 1:
        # First, make a new df by referencing the dictionary
        df_new = myFrames2[key]

        # Subset the last column and make sure it doesn't
        # turn into a pandas series instead of a dataframe in the process
        df_new = df_new.iloc[:,-1].to_frame()

        # Set new column names
        df_new.columns = colName[:]
    else:
        # df_new already exists, so you can add
        # new columns and names for the rest of the columns
        df_new[colName] = myFrames2[key].iloc[:,-1].to_frame()
    j = j + 1

print(df_new)

在此处输入图片说明

希望你会发现这很有用!

顺便说一句...对于您的下一个问题,请提供一些可重现的代码以及有关您自己尝试过的解决方案的几句话。 您可以在此处阅读有关如何提出出色问题的更多信息。

这是一个简单的复制和粘贴的全部内容:

#%%

# Imports
import pandas as pd
import numpy as np

np.random.seed(123)

# Reproducible input - Dataframe 1
rows = 10
df_1 = pd.DataFrame(np.random.randint(90,110,size=(rows, 2)), columns=list('AB'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_1['dates'] = datelist 
df_1 = df_1.set_index(['dates'])
df_1.index = pd.to_datetime(df_1.index)

##%%

# Reproducible input - Dataframe 2
rows = 10
df_2 = pd.DataFrame(np.random.randint(10,20,size=(rows, 2)), columns=list('CD'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_2['dates'] = datelist 
df_2 = df_2.set_index(['dates'])
df_2.index = pd.to_datetime(df_2.index)

print(df_1)
print(df_2)
##%%


# If you dont have that many dataframes, you can organize them in a dictionary like this:
myFrames = {'df_1': df_1,
            'df_2': df_2}  


# Now you can reference df_1 in that collecton by using:
print(myFrames['df_1'])

# You can also use that reference to make changes to one of your dataframes,
# and add that to your dictionary
df_3 = myFrames['df_1']
df_3 = df_3*10
myFrames.update({'df_3': df_3})

# And now you have a happy little family of dataframes:
print(myFrames)
##%%

# Now lets say that you have whole bunch of dataframes that you'd like to organize the same way.
# You can make a list of the names of all available dataframes like this:
alldfs = [var for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)]

##%%
# It's likely that you won't be interested in all of them if you've got a lot going on.
# Lets say that all your dataframes of interest start with 'df_'
# You get them like this:
dfNames = []
for elem in alldfs:
   if str(elem)[:3] == 'df_':
       dfNames.append(elem)

##%%
# Now you can use that list in combination with eval() to make a dictionary:
myFrames2 = {}
for dfName in dfNames:
    myFrames2[dfName] = eval(dfName)

##%%
# And now you can reference each dataframe by name in that new dictionary:
myFrames2['df_1']

##%%
#Loop through that dictionary and do something with each of them.

j = 1
for key in myFrames.keys():

    # Build new column names for your brand new df
    colName = []
    colName.append('column_' + str(j))

    if j == 1:
        # First, make a new df by referencing the dictionary
        df_new = myFrames2[key]

        # Subset the last column and make sure it doesn't
        # turn into a pandas series instead for a dataframe in the process
        df_new = df_new.iloc[:,-1].to_frame()

        # Set new column names
        df_new.columns = colName[:]
    else:
        # df_new already exists, so you can add
        # new columns and names for the rest of the columns
        df_new[colName] = myFrames2[key].iloc[:,-1].to_frame()
    j = j + 1

print(df_new)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM