[英]Remove NaN 'Cells' without dropping the entire ROW (Pandas,Python3)
现在我有一个像这样的DF
Word Word2 Word3
Hello NaN NaN
My My Name NaN
Yellow Yellow Bee Yellow Bee Hive
Golden Golden Gates NaN
Yellow NaN NaN
我希望的是从我的数据框中删除所有的 NaN 单元格。 所以最后,它看起来像这样,“Yellow Bee Hive”已移动到第 1 行(类似于从 excel 中的列中删除单元格时发生的情况):
Word Word2 Word3
1 Hello My Name Yellow Bee Hive
2 My Yellow Bee
3 Yellow Golden Gates
4 Golden
5 Yellow
不幸的是,这些都不起作用,因为它们删除了整行!
df = df[pd.notnull(df['Word','Word2','Word3'])]
或者
df = df.dropna()
有人有什么建议吗? 我应该重新索引表吗?
我认为你可以使用这个:
df = df.apply(lambda x: pd.Series(x.dropna().values))
例如:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Word':['Hello', 'My', 'Yellow', 'Golden', 'Yellow'],
'Word2':[np.nan, 'My Name', 'Yellow Bee', 'Golden Gates', np.nan],
'Word3':[np.nan, np.nan, 'Yellow Bee Hive', np.nan, np.nan]
})
print(df)
初始数据框:
Word Word2 Word3
0 Hello NaN NaN
1 My My Name NaN
2 Yellow Yellow Bee Yellow Bee Hive
3 Golden Golden Gates NaN
4 Yellow NaN NaN
并应用此 lambda 函数:
df = df.apply(lambda x: pd.Series(x.dropna().values))
print(df)
给出:
Word Word2 Word3
0 Hello My Name Yellow Bee Hive
1 My Yellow Bee NaN
2 Yellow Golden Gates NaN
3 Golden NaN NaN
4 Yellow NaN NaN
然后你可以用空字符串填充 NaN 值:
df = df.fillna('')
print(df)
Word Word2 Word3
0 Hello My Name Yellow Bee Hive
1 My Yellow Bee
2 Yellow Golden Gates
3 Golden
4 Yellow
import numpy as np
import pandas as pd
import functools
def drop_and_roll(col, na_position='last', fillvalue=np.nan):
result = np.full(len(col), fillvalue, dtype=col.dtype)
mask = col.notnull()
N = mask.sum()
if na_position == 'last':
result[:N] = col.loc[mask]
elif na_position == 'first':
result[-N:] = col.loc[mask]
else:
raise ValueError('na_position {!r} unrecognized'.format(na_position))
return result
df = pd.read_table('data', sep='\s{2,}')
print(df.apply(functools.partial(drop_and_roll, fillvalue='')))
产量
Word Word2 Word3
0 Hello My Name Yellow Bee Hive
1 My Yellow Bee
2 Yellow Golden Gates
3 Golden
4 Yellow
由于您希望值向上移动,因此您必须创建一个新的数据框
开始 -
Word Word2
0 Hello NaN
1 My My Name
2 Yellow Yellow Bee
3 Golden Golden Gates
4 Yellow NaN
使用以下方法 -
def get_column_array(df, column):
expected_length = len(df)
current_array = df[column].dropna().values
if len(current_array) < expected_length:
current_array = np.append(current_array, [''] * (expected_length - len(current_array)))
return current_array
pd.DataFrame({column: get_column_array(df, column) for column in df.columns}
给 -
Word Word2
0 Hello My Name
1 My Yellow Bee
2 Yellow Golden Gates
3 Golden
4 Yellow
您还可以使用相同的功能编辑现有的 df -
for column in df.columns:
df[column] = get_column_array(df, column)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.