+-
如何根据通过熊猫从LAMMPS输出文件导入的键合数据将原子细分为三个组?

我是编程和分子动力学模拟的新手。我正在使用LAMMPS来模拟物理气相沉积(PVD)过程并确定不同时间步长中原子之间的相互作用。

在执行分子动力学模拟之后,LAMMPS为我提供了一个输出bonds文件,其中包含每个单独原子的记录(作为原子ID),它们的类型(往复到特定元素的数字)以及其他原子的信息与那些特定的原子键合。 A typical bonds file looks like this.

我的目标通过考虑来自[[bond输出文件的键合信息,并根据原子的类型(如第1组:氧-氢-氢)将原子按三组进行排序,并计算每个时间步的基团数。 我使用大熊猫,并为每个时间步创建一个数据框。

df = pd.read_table(directory, comment="#", delim_whitespace= True, header=None, usecols=[0,1,2,3,4,5,6] ) headers= ["ID","Type","NofB","bondID_1","bondID_2","bondID_3","bondID_4"] df.columns = headers df.fillna(0,inplace=True) df = df.astype(int) timestep = int(input("Number of Timesteps: ")) #To display desired number of timesteps. total_atom_number = 53924 #Total number of atoms in the simulation. t= 0 #code starts from 0th timestep. firstTime = [] while(t <= timestep): open('file.txt', 'w').close() #In while loop = displays every timestep individually, Out of the while loop = displays results cumulatively. i = 0 df_tablo =(df[total_atom_number*t:total_atom_number*(t+1)]) #Creates a new dataframe that inlucdes only t'th timestep. df_tablo.reset_index(inplace=True) print(df_tablo)
Please see this example that illustrates my algorithm to group 3 atoms。键列显示与行中的原子键合在一起的不同原子(按原子ID)。例如,通过使用该算法,我们可以将[1,2,5]和[1,2,6]分组,但不能将[1,2,1]分组,因为原子无法与其自身创建键。此外,我们可以在分组后将这些原子ID(第一列)转换为它们的原子类型(第二列),例如[1,3,7]到[1,1,3]。

通过如上所述的键,

1)

我可以根据原子的ID成功地对原子进行分组, 2)将其转换为原子类型,并且 3)计算存在的基团数在每个时间步分别。第一个 while循环(上方)为每个时间步计数组,而第二个while循环(下方)将每行中的原子(等于存在的每个原子ID)及其对应的键合原子分组来自数据框中的不同行。 while i < total_atom_number: atom1_ID = df_tablo["ID"][i] # atom ID of i'th row was defined. atom1_NB = df_tablo["NofB"][i] # number of bonds of the above atom ID was defined, but not used. atom1_bond1 = df_tablo["bondID_1"][i] #bond ID1 of above atom was defined. # bondIDs and atom types of 1,2,3 and 4 for atom1_bond1 were defined respectively. if atom1_bond1 != 0: atom2_index = (df_tablo.set_index('ID').index.get_loc(atom1_bond1)) atom2_ID = df_tablo["ID"][atom2_index] atom2_bond1 = df_tablo["bondID_1"][atom2_index] atom2_bond2 = df_tablo["bondID_2"][atom2_index] atom2_bond3 = df_tablo["bondID_3"][atom2_index] atom2_bond4 = df_tablo["bondID_4"][atom2_index] type_atom1 = df_tablo["Type"][i] type_atom2 = df_tablo["Type"][atom2_index] #If the desired conditions are satisfied, atom types are combined as [atom at i'th row, bondID1 at'ith row, 4 bondIDs respectively at the row which is equal to bondID1's row ] if atom1_ID != atom2_bond1 and atom2_bond1 != 0: set = [atom1_ID, atom2_ID, atom2_bond1] atom2_bond1_index = (df_tablo.set_index('ID').index.get_loc(atom2_bond1)) type_atom2_bond1 = df_tablo["Type"][atom2_bond1_index] print("{}{}{}".format(type_atom1, type_atom2, type_atom2_bond1), file=open("file.txt", "a")) # print(set) if atom1_ID != atom2_bond2 and atom2_bond2 != 0: set = [atom1_ID, atom2_ID, atom2_bond2] atom2_bond2_index = (df_tablo.set_index('ID').index.get_loc(atom2_bond2)) type_atom2_bond2 = df_tablo["Type"][atom2_bond2_index] print("{}{}{}".format(type_atom1, type_atom2, type_atom2_bond2), file=open("file.txt", "a")) # print(set) if atom1_ID != atom2_bond3 and atom2_bond3 != 0: set = [atom1_ID, atom2_ID, atom2_bond3] atom2_bond3_index = (df_tablo.set_index('ID').index.get_loc(atom2_bond3)) type_atom2_bond3 = df_tablo["Type"][atom2_bond3_index] print("{}{}{}".format(type_atom1, type_atom2, type_atom2_bond3), file=open("file.txt", "a")) # print(set) if atom1_ID != atom2_bond4 and atom2_bond4 != 0: set = [atom1_ID, atom2_ID, atom2_bond4] atom2_bond4_index = (df_tablo.set_index('ID').index.get_loc(atom2_bond4)) type_atom2_bond4 = df_tablo["Type"][atom2_bond4_index] print("{}{}{}".format(type_atom1, type_atom2, type_atom2_bond4), file=open("file.txt", "a")) # print(set) # bondIDs and atom types of 1,2,3 and 4 for atom1_bond2 were defined respectively. atom1_bond2 = df_tablo["bondID_2"][i] if atom1_bond2 != 0: atom1_bond2_index = (df_tablo.set_index('ID').index.get_loc(atom1_bond2) + 6) atom1_bond2_ID = df_tablo["ID"][atom1_bond2_index] atom1_bond2_bond1 = df_tablo["bondID_1"][atom1_bond2_index] atom1_bond2_bond2 = df_tablo["bondID_2"][atom1_bond2_index] atom1_bond2_bond3 = df_tablo["bondID_3"][atom1_bond2_index] atom1_bond2_bond4 = df_tablo["bondID_4"][atom1_bond2_index] type_atom1_bond2 = df_tablo["Type"][atom1_bond2_index] # If the desired conditions are satisfied, atom types are combined as [atom at i'th row, bondID2 at'ith row, and 4 bondIDs respectively at the row which is equal to bondID2's row ] if atom1_ID != atom1_bond2_bond1 and atom1_bond2_bond1 != 0: set = [atom1_ID, atom1_bond2, atom1_bond2_bond1] atom1_bond2_bond1_index = (df_tablo.set_index('ID').index.get_loc(atom1_bond2_bond1)) type_atom1_bond2_bond1 = df_tablo["Type"][atom1_bond2_bond1_index] print("{}{}{}".format(type_atom1, type_atom1_bond2, type_atom1_bond2_bond1), file=open("file.txt", "a")) # print(set) if atom1_ID != atom1_bond2_bond2 and atom1_bond2_bond2 != 0: set = [atom1_ID, atom1_bond2, atom1_bond2_bond2] atom1_bond2_bond2_index = (df_tablo.set_index('ID').index.get_loc(atom1_bond2_bond2)) type_atom1_bond2_bond2 = df_tablo["Type"][atom1_bond2_bond2_index] print("{}{}{}".format(type_atom1, type_atom1_bond2, type_atom1_bond2_bond2), file=open("file.txt", "a")) # print(set) if atom1_ID != atom1_bond2_bond3 and atom1_bond2_bond3 != 0: set = [atom1_ID, atom1_bond2, atom1_bond2_bond3] atom1_bond2_bond3_index = (df_tablo.set_index('ID').index.get_loc(atom1_bond2_bond3)) type_atom1_bond2_bond3 = df_tablo["Type"][atom1_bond2_bond3_index] print("{}{}{}".format(type_atom1, type_atom1_bond2, type_atom1_bond2_bond3), file=open("file.txt", "a")) # print(set) if atom1_ID != atom1_bond2_bond4 and atom1_bond2_bond4 != 0: set = [atom1_ID, atom1_bond2, atom1_bond2_bond4] atom1_bond2_bond4_index = (df_tablo.set_index('ID').index.get_loc(atom1_bond2_bond4)) type_atom1_bond2_bond4 = df_tablo["Type"][atom1_bond2_bond4_index] print("{}{}{}".format(type_atom1, type_atom1_bond2, type_atom1_bond2_bond4), file=open("file.txt", "a")) # print(set) # bondIDs and atom types of 1,2,3 and 4 for atom1_bond3 were defined respectively. atom1_bond3 = df_tablo["bondID_3"][i] if atom1_bond3 != 0: atom1_bond3_index = (df_tablo.set_index('ID').index.get_loc(atom1_bond3)) atom1_bond3_ID = df_tablo["ID"][atom1_bond3_index] atom1_bond3_bond1 = df_tablo["bondID_1"][atom1_bond3_index] atom1_bond3_bond2 = df_tablo["bondID_2"][atom1_bond3_index] atom1_bond3_bond3 = df_tablo["bondID_3"][atom1_bond3_index] atom1_bond3_bond4 = df_tablo["bondID_4"][atom1_bond3_index] type_atom1_bond3 = df_tablo["Type"][atom1_bond3_index] # If the desired conditions are satisfied, atom types are combined as [atom at i'th row, bondID3 at'ith row, and 4 bondIDs respectively at the row which is equal to bondID3's row ] if atom1_ID != atom1_bond3_bond1 and atom1_bond3_bond1 != 0: atom1_bond3_bond1_index = (df_tablo.set_index('ID').index.get_loc(atom1_bond3_bond1)) type_atom1_bond3_bond1 = df_tablo["Type"][atom1_bond3_bond1_index] print("{}{}{}".format(type_atom1, type_atom1_bond3, type_atom1_bond3_bond1), file=open("file.txt", "a")) set = [atom1_ID, atom1_bond3, atom1_bond3_bond1] # print(set) if atom1_ID != atom1_bond3_bond2 and atom1_bond3_bond2 != 0: set = [atom1_ID, atom1_bond3, atom1_bond3_bond2] atom1_bond3_bond2_index = (df_tablo.set_index('ID').index.get_loc(atom1_bond3_bond2)) type_atom1_bond3_bond2 = df_tablo["Type"][atom1_bond3_bond2_index] print("{}{}{}".format(type_atom1, type_atom1_bond3, type_atom1_bond3_bond2), file=open("file.txt", "a")) # print(set) if atom1_ID != atom1_bond3_bond3 and atom1_bond3_bond3 != 0: set = [atom1_ID, atom1_bond3, atom1_bond3_bond3] atom1_bond3_bond3_index = (df_tablo.set_index('ID').index.get_loc(atom1_bond3_bond3)) type_atom1_bond3_bond3 = df_tablo["Type"][atom1_bond3_bond3_index] print("{}{}{}".format(type_atom1, type_atom1_bond3, type_atom1_bond3_bond3), file=open("file.txt", "a")) # print(set) if atom1_ID != atom1_bond3_bond4 and atom1_bond3_bond4 != 0: set = [atom1_ID, atom1_bond3, atom1_bond3_bond4] atom1_bond3_bond4_index = (df_tablo.set_index('ID').index.get_loc(atom1_bond3_bond4)) type_atom1_bond3_bond4 = df_tablo["Type"][atom1_bond3_bond4_index] print("{}{}{}".format(type_atom1, type_atom1_bond3, type_atom1_bond3_bond4), file=open("file.txt", "a")) # print(set) atom1_bond4 = df_tablo["bondID_4"][i] # bondIDs and atom types of 1,2,3 and 4 for atom1_bond4 were defined respectively. if atom1_bond4 != 0: atom1_bond4_index = (df_tablo.set_index('ID').index.get_loc(atom1_bond4)) atom1_bond4_ID = df_tablo["ID"][atom1_bond4_index] atom1_bond4_bond1 = df_tablo["bondID_1"][atom1_bond4_index] atom1_bond4_bond2 = df_tablo["bondID_2"][atom1_bond4_index] atom1_bond4_bond3 = df_tablo["bondID_3"][atom1_bond4_index] atom1_bond4_bond4 = df_tablo["bondID_4"][atom1_bond4_index] type_atom1_bond4 = df_tablo["Type"][atom1_bond4_index] # If the desired conditions are satisfied, atom types are combined as [atom at i'th row, bondID4 at'ith row, and 4 bondIDs respectively at the row which is equal to bondID4's row ] if atom1_ID != atom1_bond4_bond1 and atom1_bond4_bond1 != 0: set = [atom1_ID, atom1_bond4, atom1_bond4_bond1] atom1_bond4_bond1_index = (df_tablo.set_index('ID').index.get_loc(atom1_bond4_bond1)) type_atom1_bond4_bond1 = df_tablo["Type"][atom1_bond4_bond1_index] print("{}{}{}".format(type_atom1, type_atom1_bond4, type_atom1_bond4_bond1), file=open("file.txt", "a")) # print(set) if atom1_ID != atom1_bond4_bond2 and atom1_bond4_bond2 != 0: set = [atom1_ID, atom1_bond4, atom1_bond4_bond2] atom1_bond4_bond2_index = (df_tablo.set_index('ID').index.get_loc(atom1_bond4_bond2)) type_atom1_bond4_bond2 = df_tablo["Type"][atom1_bond4_bond2_index] print("{}{}{}".format(type_atom1, type_atom1_bond4, type_atom1_bond4_bond2), file=open("file.txt", "a")) # print(set) if atom1_ID != atom1_bond4_bond3 and atom1_bond4_bond3 != 0: set = [atom1_ID, atom1_bond4, atom1_bond4_bond3] atom1_bond4_bond3_index = (df_tablo.set_index('ID').index.get_loc(atom1_bond4_bond3)) type_atom1_bond4_bond3 = df_tablo["Type"][atom1_bond4_bond3_index] print("{}{}{}".format(type_atom1, type_atom1_bond4, type_atom1_bond4_bond3), file=open("file.txt", "a")) # print(set) if atom1_ID != atom1_bond4_bond4 and atom1_bond4_bond4 != 0: set = [atom1_ID, atom1_bond4, atom1_bond4_bond4] atom1_bond4_bond4_index = (df_tablo.set_index('ID').index.get_loc(atom1_bond4_bond4)) type_atom1_bond4_bond4 = df_tablo["Type"][atom1_bond4_bond4_index] print("{}{}{}".format(type_atom1, type_atom1_bond4, type_atom1_bond4_bond4), file=open("file.txt", "a")) # print(set) print(i,".step" ) print(time.time() - start_time, "seconds") i = i + 1 print("#timestep", t, file=open("file.txt", "a")) print("#timestep", t) df_veri = pd.read_table('file.txt', comment="#", header=None) df_veri.columns = ["timestep %d" % (t)] #Created a dictionary that corresponds to type of bonds df_veri["timestep %d" % (t)] = df_veri["timestep %d" % (t)].astype(str).replace( {'314': 'NCO', '312': 'NCH', '412': 'OCH', '214': 'HCO', '431': 'ONC', '414': 'OCO', '212': 'HCH', '344': 'NOO', '343': 'NON', '441': 'OOC', '144': 'COO', '421': 'OHC', '434': 'ONO', '444': 'OOO', '121': 'CHC', '141': 'COC' }) # To calculate the number of 3-atom combinations ndf = df_veri.apply(pd.Series.value_counts).fillna(0) ndfy = pd.DataFrame(ndf) ndfy1 = ndfy.transpose() # To write the number of 3-atom combinations in first timestep with headers and else without headers. if firstTime == []: ndfy1.to_csv('filename8.csv', mode='a', header=True) firstTime.append('Not Empty') else: ndfy1.to_csv('filename8.csv', mode='a', header=False) t = t + 1
This is a typical output file of my code in csv format

尽管代码有效,但由于效率不高;

每个原子ID只能迭代4个以上的键原子(但是,模拟结果最多可以计算12个应计的键原子。] >>

程序运行缓慢。 (我处理了超过50000个原子,每个时间步长可能需要88分钟才能计算出来。)

您能推荐我一种更有效的方法吗?由于我是编程新手,所以我不知道是否有其他适用于我的案例的python迭代工具或软件包。我相信,如果我可以用更少的代码行执行那些操作(特别是如果我可以摆脱重复的[[if

语句)。

感谢您的时间。

我是编程和分子动力学模拟的新手。我正在使用LAMMPS来模拟物理气相沉积(PVD)过程,并确定不同原子之间的相互作用...
0
投票
import pandas as pd import random from collections import defaultdict as dd from collections import Counter import time # create 100000 unique trios of numbers ids = list(range(50000)) trios_set = set() while len(trios_set)<100000: trio = random.sample(ids,3) trios_set.add(frozenset(trio)) ids_dict = dd(list) # a dictionery where id is the key and value is all the id who are partner with it in a list for s in trios_set: for id in s: for other_id in s: if id!= other_id: ids_dict[id].append(other_id) ids_dict = dict(ids_dict) for_df = [] type_list = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"] for id in ids_dict: massage = {} massage["id"] = id other_id_index = 1 for other_id in ids_dict[id]: massage["id_"+str(other_id_index)] = other_id other_id_index+=1 massage["type"] = random.choice(type_list) for_df.append(massage) df = pd.DataFrame(for_df) # a table with id colomn and all ids who are with it in trios in id_1 id_2.. and type column with a letter start_time = time.time() #till here we build the input table, now check the time for 100000 atoms type_dict = {} from_df = dd(set) for i,r in df.iterrows(): #move the dataframe to a dict of id as key and value as list of ids who connected to it for col in df: if "id_"in col and str(r[col])!="nan": from_df[r["id"]].add(r[col]) type_dict[r["id"]] = r["type"] #save the type of id in a dictionery from_df = dict(from_df) out_trio_set = set() for id in from_df: for other_id in from_df[id]: if other_id!= id and str(other_id)!="nan": for third_id in from_df[other_id]: current_trio = frozenset([id, other_id,third_id]) if len(current_trio)==3: out_trio_set.add(current_trio) type_conter = Counter() for trio in out_trio_set: type_list = [] for id in trio: type_list.append(type_dict[id]) type_list = sorted(type_list) atom_type = "".join(type_list) type_conter[atom_type] +=1 out_df = pd.DataFrame(type_conter, index = [1]) # in here put index as timestamp out_df.to_excel(r"D:\atom.xlsx") print("--- %s seconds ---" % (time.time() - start_time))