[英]How to delete more lines in between lines you like python?
I have a weird file format 我有一个奇怪的文件格式
###########################################################
# Name of file#
# stuff[hh:mm:ss:ms] stuff[num] stuff[num] stuff[] stuff[]#
###########################################################
00:00:00.000 -1000 -1000 0.000001 20
00:00:00.001 -1000 -1000 0.000001 20
00:00:00.002 -1000 -1000 0.000001 20
00:00:00.003 -1000 -1000 0.000001 20
00:00:00.004 -1000 -1000 0.000001 20
00:00:00.005 -1000 -1000 0.000001 20
00:00:00.006 -1000 -1000 0.000001 20
00:00:00.007 -1000 -1000 0.000001 20
the problem is I need only info every 2 sec. 问题是我每2秒只需要信息。 Which means i need to edit out 1999 lines in between.(the space is actually /t) What is the best way of doing that.
这意味着我需要在它们之间编辑出1999行。(空格实际上是/ t)什么是最好的方法。 I would also like to have the numbers saved as numbers not strings.
我也想将数字另存为数字而不是字符串。
df = pd.read_csv('file.txt', sep="\t",
names=("time", "num1", "num2", "num3", "num4"), skiprows=4)
df["abs_time"] = df.index * 1e-3
I had to define time differently i already have the code for that i just need to save it properly. 我必须以不同的方式定义时间,我已经有了相应的代码,只需要正确保存即可。
def get_sec(time_str):
m, s, ss = time_str.split(':')
return int(m) * 60 + int(s) + 0.01*int(ss)
Any help well appreciated. 任何帮助,不胜感激。
As you need data for every 2 seconds, it will indicate you need to have second which is even and ending with "000"(you could choose odd seconds as well) assuming you have no missing data 由于您每2秒需要数据,这将表明您需要秒数为偶数,并且以“ 000”结尾(也可以选择奇数秒),前提是您没有丢失数据
def is_select(time_str):
return str.endswith(time_str, ".000") and int(time_str[6:8])%2
df['even_seconds'] = pd.apply(lambda x: is_select(x["time"]), axis=1)
select_data = df[df.even_seconds==True]
x["time"][6:8]
will give you seconds information (you could adjust the index yourself). x["time"][6:8]
将为您提供秒信息(您可以自己调整索引)。
Of course, you could modify lambda function for other data selections. 当然,您可以为其他数据选择修改lambda函数。
You can use skiprows
parameter to get odd rows (or even). 您可以使用
skiprows
参数获取奇数行(或偶数行)。 From the documentation: 从文档中:
If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise.
如果可调用,则将针对行索引评估可调用函数,如果应跳过该行,则返回True,否则返回False。 An example of a valid callable argument would be lambda x: x in [0, 2].
有效的可调用参数的示例为lambda x:[0,2]中的x。
Here you have an example csv: 这里有一个示例csv:
#
#
#
#
A,B
1,1
2,2
3,3
4,4
Then you can: 那么你也能:
pd.read_csv('test.csv', skiprows=lambda x: True if x < 4 or x%2 == 1 else False)
Output: 输出:
A B
0 2 2
1 4 4
As you can see, you can read odd or even lines and thus getting only rows every 2 seconds. 如您所见,您可以读取奇数行或偶数行,因此每2秒仅获得一行。 Notice though, this assumes:
但是请注意,这假定:
You cumsum the milisecond and check if they are modulo 2000, assuming you have strings in your first column. 假定毫秒在第一列中,您将毫秒加上了毫秒并检查它们是否为2000模。
vector_bool = df[df.columns[0]].apply(lambda x: x.split(".")[-1]).astype(int).cumsum().apply( lambda x: x%2000 == 0 )
Then take only the row wich are true. 然后仅取至真的那一行。
df_clean = df[vector_bool]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.