简体   繁体   English

熊猫将行迭代到新的数据框

[英]pandas iterate rows to a new dataframe

How to I sperate rows and form a new dataframe with the series ? 如何对行进行分类并与该系列形成一个新的数据框?

Suppose I have a dataframe df and I am iterating over df with the following and trying to append over an empty dataframe 假设我有一个数据框df,并使用以下内容遍历df并尝试在一个空的数据框上追加

df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)),
                    columns=['a', 'b', 'c', 'd', 'e'])

df1 = pd.DataFrame()
df2 = pd.DataFrame()

for index,row in df.iterrows():
    if (few conditions goes here):
        df1.append(row)
    else:
        df2.append(row)

the type of each rows over iteration is a series, but if I append it to empty dataframe it appends rows as columns and columns as row. 迭代中每行的类型是一个序列,但是如果我将其附加到空数据框,则会将行附加为列,将列附加为行。 Is there a fix for this ? 有解决办法吗?

I think the best is avoid iterating and use boolean indexing with conditions chained by & for AND , | 我认为最好的方法是避免迭代,并在&AND |链接的条件下使用boolean indexing | for OR , ~ for NOT and ^ for XOR : 对于OR~表示NOT^表示XOR

#define all conditions
mask = (df['a'] > 2) & (df['b'] > 3)
#filter
df1 = df[mask]
#invert condition by ~
df2 = df[~mask]

Sample: 样品:

np.random.seed(125)
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)),
                    columns=['a', 'b', 'c', 'd', 'e'])
print (df)
   a  b  c  d  e
0  2  7  3  6  0
1  5  6  2  5  0
2  4  2  9  0  7
3  2  7  9  5  3
4  5  7  9  9  1

mask = (df['a'] > 2) & (df['b'] > 3)
print (mask)
0    False
1     True
2    False
3    False
4     True


df1 = df[mask]
print (df1)
   a  b  c  d  e
1  5  6  2  5  0
4  5  7  9  9  1

df2 = df[~mask]
print (df2)
   a  b  c  d  e
0  2  7  3  6  0
2  4  2  9  0  7
3  2  7  9  5  3

EDIT: 编辑:

Loop version, if possible dont use it because slow: 循环版本,如果可能的话请不要使用它,因为速度慢:

df1 = pd.DataFrame(columns=df.columns)
df2 = pd.DataFrame(columns=df.columns)

for index,row in df.iterrows():
    if (row['a'] > 2) and (row['b'] > 3):
       df1.loc[index] = row
    else:
       df2.loc[index] = row


print (df1)
   a  b  c  d  e
1  5  6  2  5  0
4  5  7  9  9  1

print (df2)
   a  b  c  d  e
0  2  7  3  6  0
2  4  2  9  0  7
3  2  7  9  5  3

尝试查询方法

df2 = df1.query('conditions go here')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM