[英]pandas iterate rows to a new dataframe
How to I sperate rows and form a new dataframe with the series ? 如何对行进行分类并与该系列形成一个新的数据框?
Suppose I have a dataframe df and I am iterating over df with the following and trying to append over an empty dataframe 假设我有一个数据框df,并使用以下内容遍历df并尝试在一个空的数据框上追加
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)),
columns=['a', 'b', 'c', 'd', 'e'])
df1 = pd.DataFrame()
df2 = pd.DataFrame()
for index,row in df.iterrows():
if (few conditions goes here):
df1.append(row)
else:
df2.append(row)
the type of each rows over iteration is a series, but if I append it to empty dataframe it appends rows as columns and columns as row. 迭代中每行的类型是一个序列,但是如果我将其附加到空数据框,则会将行附加为列,将列附加为行。 Is there a fix for this ?
有解决办法吗?
I think the best is avoid iterating and use boolean indexing
with conditions chained by &
for AND
, |
我认为最好的方法是避免迭代,并在
&
, AND
|
链接的条件下使用boolean indexing
|
for OR
, ~
for NOT
and ^
for XOR
: 对于
OR
, ~
表示NOT
, ^
表示XOR
:
#define all conditions
mask = (df['a'] > 2) & (df['b'] > 3)
#filter
df1 = df[mask]
#invert condition by ~
df2 = df[~mask]
Sample: 样品:
np.random.seed(125)
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)),
columns=['a', 'b', 'c', 'd', 'e'])
print (df)
a b c d e
0 2 7 3 6 0
1 5 6 2 5 0
2 4 2 9 0 7
3 2 7 9 5 3
4 5 7 9 9 1
mask = (df['a'] > 2) & (df['b'] > 3)
print (mask)
0 False
1 True
2 False
3 False
4 True
df1 = df[mask]
print (df1)
a b c d e
1 5 6 2 5 0
4 5 7 9 9 1
df2 = df[~mask]
print (df2)
a b c d e
0 2 7 3 6 0
2 4 2 9 0 7
3 2 7 9 5 3
EDIT: 编辑:
Loop version, if possible dont use it because slow: 循环版本,如果可能的话请不要使用它,因为速度慢:
df1 = pd.DataFrame(columns=df.columns)
df2 = pd.DataFrame(columns=df.columns)
for index,row in df.iterrows():
if (row['a'] > 2) and (row['b'] > 3):
df1.loc[index] = row
else:
df2.loc[index] = row
print (df1)
a b c d e
1 5 6 2 5 0
4 5 7 9 9 1
print (df2)
a b c d e
0 2 7 3 6 0
2 4 2 9 0 7
3 2 7 9 5 3
尝试查询方法
df2 = df1.query('conditions go here')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.