[英]Pandas: Create new column and add value depending on value (substring) in a string column and value on another column
[英]Add new column in pandas dataframe using empty string or the value from column A depending on the value on column B
我有以下熊貓數據框:
df['price_if_0005'] = df['price'] % Decimal('0.0005')
print(tabulate(df, headers='keys', tablefmt='psql'))
+-----+---------+-------------+-----------------+-----------------+
| | price | tpo_count | tpo | price_if_0005 |
|-----+---------+-------------+-----------------+-----------------|
| 0 | 1.4334 | 1 | n | 0.0004 |
| 1 | 1.4335 | 1 | n | 0 |
| 2 | 1.4336 | 1 | n | 0.0001 |
| 3 | 1.4337 | 1 | n | 0.0002 |
| 4 | 1.4338 | 1 | n | 0.0003 |
| 5 | 1.4339 | 1 | n | 0.0004 |
| 6 | 1.434 | 1 | n | 0 |
| 7 | 1.4341 | 1 | n | 0.0001 |
| 8 | 1.4342 | 3 | noq | 0.0002 |
| 9 | 1.4343 | 3 | noq | 0.0003 |
| 10 | 1.4344 | 3 | noq | 0.0004 |
我想要另一列為空字符串,或者當'price_if_0005'為0時,來自'price'列的值。IE這將是所需的結果表:
+-----+---------+-------------+-----------------+-----------------+--------+
| | price | tpo_count | tpo | price_if_0005 | label |
|-----+---------+-------------+-----------------+-----------------|--------+
| 0 | 1.4334 | 1 | n | 0.0004 | |
| 1 | 1.4335 | 1 | n | 0 | 1.4335 |
| 2 | 1.4336 | 1 | n | 0.0001 | |
| 3 | 1.4337 | 1 | n | 0.0002 | |
| 4 | 1.4338 | 1 | n | 0.0003 | |
| 5 | 1.4339 | 1 | n | 0.0004 | |
| 6 | 1.4340 | 1 | n | 0 | 1.4340 |
| 7 | 1.4341 | 1 | n | 0.0001 | |
| 8 | 1.4342 | 3 | noq | 0.0002 | |
| 9 | 1.4343 | 3 | noq | 0.0003 | |
| 10 | 1.4344 | 3 | noq | 0.0004 | |
我努力了:
df['label'] = ['' if x == 0 else str(y) for x,y in df['price_if_0005'], df['price']]
但是我得到:
File "<ipython-input-67-90c17f2505bf>", line 3
df['label'] = ['' if x == 0 else str(y) for x,y in df['price_if_0005'], df['price']]
^
SyntaxError: invalid syntax
只需在熊貓條件下使用.loc
即可僅分配所需的行:
df.loc[df['price_if_0005'] == 0, 'label'] = df['price']
完整的例子:
import pandas as pd
from io import StringIO
s = """
price | tpo_count | tpo | price_if_0005
0 | 1.4334 | 1 | n | 0.0004
1 | 1.4335 | 1 | n | 0
2 | 1.4336 | 1 | n | 0.0001
3 | 1.4337 | 1 | n | 0.0002
4 | 1.4338 | 1 | n | 0.0003
5 | 1.4339 | 1 | n | 0.0004
6 | 1.434 | 1 | n | 0
7 | 1.4341 | 1 | n | 0.0001
8 | 1.4342 | 3 | noq | 0.0002
9 | 1.4343 | 3 | noq | 0.0003
10 | 1.4344 | 3 | noq | 0.0004 """
df = pd.read_csv(StringIO(s), sep="\s+\|\s+")
df.loc[df['price_if_0005'] == 0, 'label'] = df['price']
df['label'].fillna('',inplace=True)
print(df)
輸出:
price tpo_count tpo price_if_0005 label
0 1.4334 1 n 0.0004
1 1.4335 1 n 0.0000 1.4335
2 1.4336 1 n 0.0001
3 1.4337 1 n 0.0002
4 1.4338 1 n 0.0003
5 1.4339 1 n 0.0004
6 1.4340 1 n 0.0000 1.434
7 1.4341 1 n 0.0001
8 1.4342 3 noq 0.0002
9 1.4343 3 noq 0.0003
10 1.4344 3 noq 0.0004
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.