I would like to:
FileNo
Sample file:
ID FileNo Name A1 A2 A3
1 0 John a-b b-a a-a
2 0 Carol b-b a-b a-b
[...]
500 0 Steve a-a b-b a-b
501 0 Jack b-a b-a a-b
True dimension for each file: 2000x15000
Function: reverse the string.
flip_over = lambda x: x[::-1]
or
my_dict = {'a-b':'b-a', 'a-a':'a-a', 'b-b':'b-b', 'b-a':'a-b'}
map(my_dict)
What I currently have:
whether_to_flip = [7,15,23,36,48,85]
frames = []
base_path = "/home/user/file_"
for i in range(0, 100):
path = base_path + str(i) + ".tsv"
df = pd.read_csv(path, sep="\t", header=None)
df['FileNo'] = str(i)
if i in whether_to_flip:
for j in range(3,6):
df[j] = df[j].map(my_dict)
frames.append(df)
combined = pd.concat(frames, axis=0, ignore_index=True)
This is currently taking hours to finish reading and processing, and I hit the memory limit when I need to increase the number of files to read.
I would appreciate any help to improve this code. In particular,
Thank you.
First, I guess you should understand how much time you lose in reading csv vs time to invert the strings.
I can see a couple of things that can speed up the program:
Avoid the loop over the columns
You can use replace and my_dict: (ref)
if i in whether_to_flip:
df = df.replace(my_dict)
# df = df.replace({'A1' : my_dict, 'A2' : my_dict, 'A3' : my_dict)
I think this should give considerable improvement in performance.
List comprehension to avoid .append
This can make the syntax a bit more cumbersome, but could have some tiny efficiency gain
def do_path(x):
return base_path + str(i) + ".csv"
[ pd.read_csv(do_path(i), sep="\t", header=None).assign(FileNo = str(i)) if i not in whether_to_flip
else pd.read_csv(do_path(i), sep="\t", header=None).assign(FileNo = str(i)).map(my_dict)
for i in range(0, 100)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.