从现有 dataframe 的某些列创建新的 pandas dataframe

Question

I have read a csv file into a pandas dataframe and want to do some simple manipulations on the dataframe. I can not figure out how to create a new dataframe based on selected columns from my original dataframe. My attempt:我已将 csv 文件读入 pandas dataframe 并想对 dataframe 进行一些简单的操作。我无法弄清楚如何根据原始 88140211179:688 中的选定列创建新的 dataframe。我的尝试

names = ['A','B','C','D']
dataset = pandas.read_csv('file.csv', names=names)
new_dataset = dataset['A','D']

I would like to create a new dataframe with the columns A and D from the original dataframe.我想用原始 dataframe 的 A 列和 D 列创建一个新的 dataframe。

Answer 1

It is called subset - passed list of columns in [] : 它被称为subset - 在[]传递列的列表：

dataset = pandas.read_csv('file.csv', names=names)

new_dataset = dataset[['A','D']]

what is same as: 同样如下：

new_dataset = dataset.loc[:, ['A','D']]

If need only filtered output add parameter usecols to read_csv : 如果只需要过滤输出，请将参数usecols添加到read_csv ：

new_dataset = pandas.read_csv('file.csv', names=names, usecols=['A','D'])

EDIT: 编辑：

If use only: 如果仅使用：

new_dataset = dataset[['A','D']]

and use some data manipulation, obviously get: 并使用一些数据操作，显然得到：

A value is trying to be set on a copy of a slice from a DataFrame. 尝试在DataFrame的切片副本上设置值。
Try using .loc[row_indexer,col_indexer] = value instead 尝试使用.loc [row_indexer，col_indexer] = value

If you modify values in new_dataset later you will find that the modifications do not propagate back to the original data ( dataset ), and that Pandas does warning. 如果稍后修改new_dataset值，您会发现修改不会传播回原始数据（ dataset ），并且Pandas会发出警告。

As pointed EdChum add copy for remove warning: 正如EdChum指出的那样，为删除警告添加copy ：

new_dataset = dataset[['A','D']].copy()

Answer 2

You must pass a list of column names to select columns.您必须将列名列表传递给 select 列。 Otherwise, it will be interpreted as MultiIndex;否则，它将被解释为 MultiIndex； df['A','D'] would work if df.columns was MultiIndex.如果df.columns是 MultiIndex df['A','D']将起作用。

The most obvious way is df.loc[:, ['A', 'B']] but there are other ways (note how all of them take lists):最明显的方法是df.loc[:, ['A', 'B']]但还有其他方法（注意它们都是如何获取列表的）：

df1 = df.filter(items=['A', 'D'])

df1 = df.reindex(columns=['A', 'D'])

df1 = df.get(['A', 'D']).copy()

NB items is the first positional argument, so df.filter(['A', 'D']) also works. NB items是第一个位置参数，因此df.filter(['A', 'D'])也有效。

Note that filter() and reindex() return a copy as well, so you don't need to worry about getting SettingWithCopyWarning later.请注意filter()和reindex()也会返回一个副本，因此您不必担心稍后会收到SettingWithCopyWarning 。

从现有 dataframe 的某些列创建新的 pandas dataframe

问题描述

2 个解决方案

解决方案1
7 已采纳 2017-07-11 13:28:39

解决方案2
0 2023-02-02 05:48:01

从现有 dataframe 的某些列创建新的 pandas dataframe

问题描述

2 个解决方案

解决方案1 7 已采纳 2017-07-11 13:28:39

解决方案2 0 2023-02-02 05:48:01

解决方案1
7 已采纳 2017-07-11 13:28:39

解决方案2
0 2023-02-02 05:48:01