[英]Pandas to fill empty cells in column according to another column
A dataframe looks like this, and I want to fill the empty cells in the 'Date' column (when the "Area" is West or North), with content in "Year" column plus "0601". dataframe 看起来像这样,我想填充“日期”列中的空单元格(当“区域”为西或北时),“年”列中的内容加上“0601”。
Wanted result is as follows:想要的结果如下:
What I have tried:我试过的:
from io import StringIO
import pandas as pd
csvfile = StringIO(
"""
Name Area Date Year
David West 2014
Mike North 20220919 2022
Kate West 2017
Lilly East 20221226 2022
Peter North 20221226 2022
Cara Middle 2016
""")
df = pd.read_csv(csvfile, sep = '\t', engine='python')
L1 = ['West','North']
m1 = df['Date'].isnull()
m2 = df['Area'].isin(L1)
df['Date'] = df['Date'].mask(m1 & m2, df['Year'] + '0601') # Try_1
df['Date'] = np.where(np.where(m1 & m2, df['Year'] + '0601')) # Try_2
Both Try_1 and Try_2 pop the same error. Try_1 和 Try_2 都弹出相同的错误。
What's the right way to write the lines?台词的正确写法是什么? Thank you.谢谢你。
Traceback (most recent call last):
File "C:\Python38\lib\site-packages\pandas\core\ops\array_ops.py", line 142, in _na_arithmetic_op
result = expressions.evaluate(op, left, right)
File "C:\Python38\lib\site-packages\pandas\core\computation\expressions.py", line 235, in evaluate
return _evaluate(op, op_str, a, b) # type: ignore[misc]
File "C:\Python38\lib\site-packages\pandas\core\computation\expressions.py", line 69, in _evaluate_standard
return op(a, b)
numpy.core._exceptions.UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U21'), dtype('<U21')) -> dtype('<U21')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\My Documents\Scripts\(Desktop) WSS 20200323\GG.py", line 336, in <module>
df['Date'] = np.where(np.where(m1 & m2, df['Year'] + '0601')) # try 2
File "C:\Python38\lib\site-packages\pandas\core\ops\common.py", line 65, in new_method
return method(self, other)
File "C:\Python38\lib\site-packages\pandas\core\arraylike.py", line 89, in __add__
return self._arith_method(other, operator.add)
File "C:\Python38\lib\site-packages\pandas\core\series.py", line 4998, in _arith_method
result = ops.arithmetic_op(lvalues, rvalues, op)
File "C:\Python38\lib\site-packages\pandas\core\ops\array_ops.py", line 189, in arithmetic_op
res_values = _na_arithmetic_op(lvalues, rvalues, op)
File "C:\Python38\lib\site-packages\pandas\core\ops\array_ops.py", line 149, in _na_arithmetic_op
result = _masked_arith_op(left, right, op)
File "C:\Python38\lib\site-packages\pandas\core\ops\array_ops.py", line 111, in _masked_arith_op
result[mask] = op(xrav[mask], y)
numpy.core._exceptions.UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U21'), dtype('<U21')) -> dtype('<U21')
You example works find, provided you have strings:如果您有字符串,您可以找到示例作品:
csvfile = StringIO("""
Name Area Date Year
David West NaN 2014
Mike North 20220919 2022
Kate West NaN 2017
Lilly East 20221226 2022
Peter North 20221226 2022
Cara Middle NaN 2016
""")
df = pd.read_csv(csvfile, sep = '\s+', engine='python', dtype='str')
L1 = ['West','North']
m1 = df['Date'].isnull()
m2 = df['Area'].isin(L1)
df['Date'] = df['Date'].mask(m1 & m2, df['Year'] + '0601')
print(df)
If year is not a string:如果年份不是字符串:
df['Date'] = df['Date'].mask(m1 & m2, df['Year'].astype(str) + '0601')
Output: Output:
Name Area Date Year
0 David West 20140601 2014
1 Mike North 20220919 2022
2 Kate West 20170601 2017
3 Lilly East 20221226 2022
4 Peter North 20221226 2022
5 Cara Middle NaN 2016
If you have numeric data:如果您有数字数据:
df['Date'] = df['Date'].mask(m1 & m2, df['Year'].mul(10000) + 601)
Output: Output:
Name Area Date Year
0 David West 20140601.0 2014
1 Mike North 20220919.0 2022
2 Kate West 20170601.0 2017
3 Lilly East 20221226.0 2022
4 Peter North 20221226.0 2022
5 Cara Middle NaN 2016
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.