[英]Python Pandas Dataframe - row level operations
I need to do a large number of row level operations (a few pages of code) on a table of data.我需要对一张数据表进行大量的行级操作(几页代码)。
Eg if row.Col_A == 'X': row.Col_B = 'Y'
例如
if row.Col_A == 'X': row.Col_B = 'Y'
I believe iterrows isn't appropriate for altering table values.我相信 iterrows 不适合更改表值。 So I've converted the table to a list of DotMap dictionaries.
因此,我已将表格转换为 DotMap 字典列表。 With this I can loop over the list and for each dictionary (row), write the code as above and the alterations are saved.
有了这个,我可以遍历列表并为每个字典(行)编写上面的代码并保存更改。
Is it possible to do this with the data as a DataFrame?是否可以将数据作为 DataFrame 执行此操作?
There is a lot of logic and I think its clearest written this way so I'd prefer not to use map or apply functions.有很多逻辑,我认为这样写最清楚,所以我不想使用 map 或应用函数。
Let's have the following example dataframe:让我们有以下示例 dataframe:
import pandas as pd
import numpy as np
some_data = pd.DataFrame({
'col_a': [1, 2, 1, 2, 3, 4, 3, 4],
'col_b': ['a', 'b', 'c', 'c', 'a', 'b', 'z', 'z']
})
We want to create a new column based on one (or more) of the existing columns' values.我们希望基于一个(或多个)现有列的值创建一个新列。
In case you have only two options, I would suggest using numpy.where like this:如果您只有两个选项,我建议您使用 numpy.where 像这样:
some_data['np_where_example'] = np.where(some_data.col_a < 3, 'less_than_3', 'greater_than_3')
print(some_data)
>>>
col_a col_b col_c map_example np_where_example \
0 1 a less_than_3 NaN less_than_3
1 2 b less_than_3 BBB less_than_3
2 1 c less_than_3 NaN less_than_3
3 2 c less_than_3 NaN less_than_3
4 3 a greater_than_3 NaN greater_than_3
5 4 b greater_than_3 BBB greater_than_3
6 3 z greater_than_3 ZZZ greater_than_3
7 4 z greater_than_3 ZZZ greater_than_3
# multiple conditions
some_data['np_where_multiple_conditions'] = np.where(((some_data.col_a >= 3) & (some_data.col_b == 'z')),
'is_true',
'is_false')
print(some_data)
>>>
col_a col_b np_where_multiple_conditions
0 1 a is_false
1 2 b is_false
2 1 c is_false
3 2 c is_false
4 3 a is_false
5 4 b is_false
6 3 z is_true
7 4 z is_true
In case you have many options, then pandas.map would be better:如果您有很多选择,那么 pandas.map 会更好:
some_data['map_example'] = some_data.col_b.map({
'b': 'BBB',
'z': 'ZZZ'
})
print(some_data)
>>>
col_a col_b map_example
0 1 a NaN
1 2 b BBB
2 1 c NaN
3 2 c NaN
4 3 a NaN
5 4 b BBB
6 3 z ZZZ
7 4 z ZZZ
As you see, in all cases the values for which a condition is not specified evaluate to NaN
.如您所见,在所有情况下,未指定条件的值的计算结果为
NaN
。
You can use the apply function with a lambda in the following way:您可以通过以下方式将应用 function 与 lambda 一起使用:
df['Col_B'] = df['Col_A'].apply(lambda a: 'Y' if a == 'X' else 'N')
This creates the column Col_B on the dataframe df by looking at Col_A and giving either the values 'Y' if Col_A is 'X' and 'N' otherwise.这会在 dataframe df 上创建列 Col_B,方法是查看 Col_A 并在 Col_A 为“X”时给出值“Y”,否则为“N”。
if your function is a bit more complex you can define it beforehand and call it in the apply function as follows:如果您的 function 有点复杂,您可以预先定义它并在应用 function 中调用它,如下所示:
def yes_or_no(x):
if x == 'X':
return 'Y'
else:
return 'N'
df['Col_B'] = df['Col_A'].apply(lambda a: yes_or_no(a))
A possible way to iterate over a dataframe by rows and change column values is:按行迭代 dataframe 并更改列值的一种可能方法是:
make sure that there are no duplicated values in index (if there are, just use reset_index
to get an acceptable index)确保索引中没有重复的值(如果有,只需使用
reset_index
获取可接受的索引)
iterate over the index and access the individual values with at
遍历索引并使用
at
访问各个值
for ix in df.index: if df.at[ix, 'A'] ==...: df.at[ix, 'B'] = z
Alternatively, if you can access the columns by their positions instead of their names, you can use the even more efficient iat
:或者,如果您可以通过它们的位置而不是它们的名称来访问列,则可以使用更有效的
iat
:
for i in range(len(df)):
if df.iat[i, index_col_A] == ... :
df.iat[i, index_col_B] = z
As you access directly the individual elements, you avoid the overhead of iterrows
creating a Series per row, and can perform changes.当您直接访问单个元素时,您可以避免每行创建一个系列的
iterrows
开销,并且可以执行更改。 AFAIK, it is the less bad way when you cannot use the vectorized Pandas or numpy methods. AFAIK,当您不能使用矢量化 Pandas 或 numpy 方法时,这是一种不太糟糕的方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.