[英]I want to check if row value equals column name and change the values of the row that come after the intersecting value
I have a time series data, converted to a dataframe.我有一个时间序列数据,转换为 dataframe。 I have multiple columns, where the first column is timestamps and rest of the column names are timestamps with prices as values.我有多个列,其中第一列是时间戳,列名的 rest 是以价格为值的时间戳。
Sample dataframe:样品 dataframe:
[1]: https://i.stack.imgur.com/D2OWF.png sample dataframe [1]:https://i.stack.imgur.com/D2OWF.png样品 dataframe
The idea is to iterate over the rows and check if the row value in 'date' column matches with any column name (highlighted in blue), if it does then the value at the intersection (highlighted in yellow) should stay and all the values after it (highlighted in grey) should be replaced with null or 0's.这个想法是遍历行并检查“日期”列中的行值是否与任何列名匹配(以蓝色突出显示),如果匹配,则交叉点处的值(以黄色突出显示)应该保留并且所有值之后(以灰色突出显示)应替换为 null 或 0。
For example: value in first column "2022-01-02 00:00:00+01:00" matches with column with the same name "2022-01-02 00:00:00+01:00".例如:第一列“2022-01-02 00:00:00+01:00”中的值与同名列“2022-01-02 00:00:00+01:00”匹配。 So the intersecting value ie "80.82" should stay and the rest of the values in that row (highlighted in grey) should replaced with null or 0's.因此,相交值(即“80.82”)应该保留,并且该行中值的 rest(以灰色突出显示)应该替换为 null 或 0。 I would really appreciate your help here.我真的很感谢你在这里的帮助。
I have tried the following but this replaces the intersecting value.我尝试了以下方法,但这取代了相交值。
for i in df.columns:
df.loc[df['date']==i,i]=None
"The idea is to iterate over the rows and check if the row value in 'date' column matches with any column name df['date'] == column
, if it does then the value at the intersection should stay and all the values after it df.columns[(idx+1):]
should be replaced with null or 0's." “这个想法是遍历行并检查 'date' 列中的行值是否与任何列名df['date'] == column
匹配,如果匹配,则交叉点处的值应保留并且所有值之后df.columns[(idx+1):]
应替换为 null 或 0。”
for idx, column in enumerate(df.columns):
df.loc[df['date'] == column, df.columns[(idx+1):]]=None
The enumerate()
function adds a counter to an iterable. enumerate()
function将计数器添加到可迭代对象。
Explanation:解释:
df.columns = ["date", "2022-01-01 23:55:00+01:00", "2022-01-02 00:00:00+01:00", "2022-01-02 00:05:00+01:00", ...]
list(enumerate(df.columns))
# return
[(0, "date"), (1, "2022-01-01 23:55:00+01:00"), (2, "2022-01-02 00:00:00+01:00"), (3, "2022-01-02 00:05:00+01:00"), ...]
In the for loop, the first (0, "date")
and second iteration (1, "2022-01-01 23:55:00+01:00")
didn't match with any row.在 for 循环中,第一次(0, "date")
和第二次迭代(1, "2022-01-01 23:55:00+01:00")
与任何行都不匹配。 The third iteration match with the first row.第三次迭代与第一行匹配。 Using this as example以此为例
idx, column = 2, "2022-01-02 00:00:00+01:00"
df['date'] == column
returns a list of boolean [True, False, False, False, ...]
df['date'] == column
返回 boolean [True, False, False, False, ...]
的列表
df.columns[(idx+1):]
is df.columns[3:]
which returns a list of column names after "2022-01-02 00:00:00+01:00"
, which is ["2022-01-02 00:05:00+01:00", "2022-01-02 00:10:00+01:00", ...]
df.columns[(idx+1):]
是df.columns[3:]
,它返回"2022-01-02 00:00:00+01:00"
之后的列名列表,即["2022-01-02 00:05:00+01:00", "2022-01-02 00:10:00+01:00", ...]
df.loc[df['date'] == column, df.columns[(idx+1):]]=None
is equivalent to相当于
df.loc[[True, False, False, ...], ["2022-01-02 00:05:00+01:00", "2022-01-02 00:10:00+01:00", ...]]=None
which fill the first row and the column after "2022-01-02 00:00:00+01:00"
as None
.将"2022-01-02 00:00:00+01:00"
之后的第一行和列填充为None
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.