[英]Creating new pandas dataframe from certain columns of existing dataframe
I have read a csv file into a pandas dataframe and want to do some simple manipulations on the dataframe. I can not figure out how to create a new dataframe based on selected columns from my original dataframe. My attempt:我已将 csv 文件读入 pandas dataframe 并想对 dataframe 进行一些简单的操作。我无法弄清楚如何根据原始 88140211179:688 中的选定列创建新的 dataframe。我的尝试
names = ['A','B','C','D']
dataset = pandas.read_csv('file.csv', names=names)
new_dataset = dataset['A','D']
I would like to create a new dataframe with the columns A and D from the original dataframe.我想用原始 dataframe 的 A 列和 D 列创建一个新的 dataframe。
It is called subset
- passed list of columns in []
: 它被称为
subset
- 在[]
传递列的列表:
dataset = pandas.read_csv('file.csv', names=names)
new_dataset = dataset[['A','D']]
what is same as: 同样如下:
new_dataset = dataset.loc[:, ['A','D']]
If need only filtered output add parameter usecols
to read_csv
: 如果只需要过滤输出,请将参数
usecols
添加到read_csv
:
new_dataset = pandas.read_csv('file.csv', names=names, usecols=['A','D'])
EDIT: 编辑:
If use only: 如果仅使用:
new_dataset = dataset[['A','D']]
and use some data manipulation, obviously get: 并使用一些数据操作,显然得到:
A value is trying to be set on a copy of a slice from a DataFrame.
尝试在DataFrame的切片副本上设置值。
Try using .loc[row_indexer,col_indexer] = value instead尝试使用.loc [row_indexer,col_indexer] = value
If you modify values in new_dataset
later you will find that the modifications do not propagate back to the original data ( dataset
), and that Pandas does warning. 如果稍后修改
new_dataset
值,您会发现修改不会传播回原始数据( dataset
),并且Pandas会发出警告。
As pointed EdChum add copy
for remove warning: 正如EdChum指出的那样,为删除警告添加
copy
:
new_dataset = dataset[['A','D']].copy()
You must pass a list of column names to select columns.您必须将列名列表传递给 select 列。 Otherwise, it will be interpreted as MultiIndex;
否则,它将被解释为 MultiIndex;
df['A','D']
would work if df.columns
was MultiIndex.如果
df.columns
是 MultiIndex df['A','D']
将起作用。
The most obvious way is df.loc[:, ['A', 'B']]
but there are other ways (note how all of them take lists):最明显的方法是
df.loc[:, ['A', 'B']]
但还有其他方法(注意它们都是如何获取列表的):
df1 = df.filter(items=['A', 'D'])
df1 = df.reindex(columns=['A', 'D'])
df1 = df.get(['A', 'D']).copy()
NB items
is the first positional argument, so df.filter(['A', 'D'])
also works. NB
items
是第一个位置参数,因此df.filter(['A', 'D'])
也有效。
Note that filter()
and reindex()
return a copy as well, so you don't need to worry about getting SettingWithCopyWarning
later.请注意
filter()
和reindex()
也会返回一个副本,因此您不必担心稍后会收到SettingWithCopyWarning
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.